Metadata-Version: 2.1
Name: data-diff
Version: 0.8.0
Summary: Command-line tool and Python library to efficiently diff rows across two different databases.
Home-page: https://github.com/datafold/data-diff
License: MIT
Author: Datafold
Author-email: data-diff@datafold.com
Requires-Python: >=3.7.2,<4.0.0
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Database :: Database Engines/Servers
Classifier: Typing :: Typed
Provides-Extra: clickhouse
Provides-Extra: duckdb
Provides-Extra: mysql
Provides-Extra: oracle
Provides-Extra: postgresql
Provides-Extra: preql
Provides-Extra: presto
Provides-Extra: redshift
Provides-Extra: snowflake
Provides-Extra: trino
Provides-Extra: vertica
Requires-Dist: click (>=8.1,<9.0)
Requires-Dist: clickhouse-driver ; extra == "clickhouse"
Requires-Dist: cryptography ; extra == "snowflake"
Requires-Dist: cx_Oracle ; extra == "oracle"
Requires-Dist: dbt-artifacts-parser (>=0.3.0,<0.4.0)
Requires-Dist: dbt-core (>=1.0.0,<2.0.0)
Requires-Dist: dsnparse (<0.2.0)
Requires-Dist: duckdb (>=0.7.0,<0.8.0) ; extra == "duckdb"
Requires-Dist: keyring
Requires-Dist: mysql-connector-python (==8.0.29) ; extra == "mysql"
Requires-Dist: preql (>=0.2.19,<0.3.0) ; extra == "preql"
Requires-Dist: presto-python-client ; extra == "presto"
Requires-Dist: psycopg2 ; extra == "postgresql" or extra == "redshift"
Requires-Dist: rich
Requires-Dist: runtype (>=0.2.6,<0.3.0)
Requires-Dist: snowflake-connector-python (>=3.0.2,<4.0.0) ; extra == "snowflake"
Requires-Dist: tabulate (>=0.9.0,<0.10.0)
Requires-Dist: toml (>=0.10.2,<0.11.0)
Requires-Dist: trino (>=0.314.0,<0.315.0) ; extra == "trino"
Requires-Dist: urllib3 (<2)
Requires-Dist: vertica-python ; extra == "vertica"
Project-URL: Repository, https://github.com/datafold/data-diff
Description-Content-Type: text/markdown

<p align="left">
    <img alt="Datafold" src="https://user-images.githubusercontent.com/1799931/196497110-d3de1113-a97f-4322-b531-026d859b867a.png" width="30%" />
</p>

<h1 align="left">
data-diff: compare datasets fast, within or across SQL databases
</h1>

<br>


# Use cases

## Data Migration & Replication Testing
Compare source to target and check for discrepancies when moving data between systems:
- Migrating to a new data warehouse (e.g., Oracle > Snowflake)
- Converting SQL to a new transformation framework (e.g., stored procedures > dbt)
- Continuously replicating data from an OLTP DB to OLAP DWH (e.g., MySQL > Redshift)


Install `data-diff` with specific database adapters, e.g.:

```
pip install data-diff 'data-diff[postgresql,snowflake	]' -U
```
Run `data-diff` with connection URIs to compare tables:
```
data-diff \
  postgresql://<username>:'<password>'@localhost:5432/<database> \
  <table> \
  "snowflake://<username>:<password>@<password>/<DATABASE>/<SCHEMA>?warehouse=<WAREHOUSE>&role=<ROLE>" \
  <TABLE> \
  -k activity_id \
  -c activity \
  -w "event_timestamp < '2022-10-10'"
```
Check out [documentation](https://docs.datafold.com/reference/open_source/cli) for full command reference.

## Data Development Testing
Test SQL code and preview changes by comparing development/staging environment data to production:
1. Make a change to some SQL code
2. Run the SQL code to create a new dataset
3. Compare the dataset with its production version or another iteration

  <p align="left">
  <img alt="dbt" src="https://seeklogo.com/images/D/dbt-logo-E4B0ED72A2-seeklogo.com.png" width="10%" />
  </p>
  
`data-diff` integrates with dbt Core and dbt Cloud to seamlessly compare local development to production datasets. 

:eyes: **Watch [4-min demo video](https://www.loom.com/share/ad3df969ba6b4298939efb2fbcc14cde)**

**[Get started with data-diff & dbt](https://docs.datafold.com/development_testing/open_source)**

Reach out on the dbt Slack in [#tools-datafold](https://getdbt.slack.com/archives/C03D25A92UU) for advice and support

## Supported databases

- PostgreSQL >=10
- MySQL
- Snowflake
- BigQuery
- Redshift
- Oracle
- Presto
- Databricks
- Trino
- Clickhouse
- Vertica
- DuckDB >=0.6
- SQLite (coming soon)


<br>

## Contributors

We thank everyone who contributed so far!

<a href="https://github.com/datafold/data-diff/graphs/contributors">
  <img src="https://contributors-img.web.app/image?repo=datafold/data-diff" />
</a>

<br>

## Analytics

* [Usage Analytics & Data Privacy](https://github.com/datafold/data-diff/blob/master/docs/usage_analytics.md)

<br>

## License

This project is licensed under the terms of the [MIT License](https://github.com/datafold/data-diff/blob/master/LICENSE).

