Crunchy Bridge for Analytics now has support for Iceberg and other new features. Learn more in our Crunchy Bridge for Analytics announcement.

Introducing pgCompare: The Ultimate Multi-Database Data Comparison Tool

Brian Pace

5 min read

In the evolving world of data management, ensuring consistency and accuracy across multiple database systems is paramount. Whether you're migrating data, synchronizing systems, or performing routine audits, the ability to compare data across different database platforms is crucial. Enter pgCompare, an open-source tool designed to simplify and enhance the process of data comparison across PostgreSQL, Oracle, MySQL, and MSSQL databases.

The key features of pgCompare:

  • Multi-Database support: pgCompare stands out with its ability to connect and compare data across four major database systems: PostgreSQL, Oracle, MySQL, and MSSQL. This multi-database support is crucial for organizations managing a variety of database technologies.
  • Comparison report: pgCompare generates detailed reports highlighting the differences between datasets. These reports include information about missing records, mismatched values, and summary statistics, enabling users to quickly identify and address inconsistencies.
  • Stored results: Results are stored in a Postgres database for tracking historical compares, current status, and alerting.
  • Flexible comparison options: Users can customize their comparisons with various options such as transforming data and excluding specific columns. This flexibility ensures that comparisons are tailored to meet specific requirements.
  • Performance and scalability: Built with performance in mind, pgCompare efficiently handles large datasets with minimal impact to source and target systems. Its flexible architecture ensures that it can meet the needs of both small and large datasets.

Getting Started with pgCompare

PgCompare is an open source tool, free to use for anyone, and getting started with pgCompare is simple. The tool can be downloaded from the official git repository, https://github.com/CrunchyData/pgCompare, where users will find detailed documentation and tutorials to help them configure and run their first comparisons. With its robust feature set and ease of use, pgCompare is set to become an indispensable tool for database professionals.

pgCompare runs as an application in the location of your choice, either a local machine or remote one closer to your data store. pgCompare creates a separate Postgres database for running the queries to fetch data from your remote data stores. You’ll configure the details for your comparison in the dc_table.

After compiling the Java source code (see readme for details), the first step is to copy the pgcompare.properties.sample file topgcompare.properties and make the necessary edits for the repository, target, and source databases. With the properties file in place, use pgcompare to initialize the repository.

java -jar pgcompare.jar --init

There is a sample table available in the database directory of the git repository. If you do not have tables already in place, deploy the HR.EMP table to the source and target database of your choice.

Last step before executing a compare is to register the tables with the pgCompare repository. To do this, simply execute pgCompare with the discovery flag followed by the schema it should perform the discovery against (hr in this example).

java -jar pgcompare.jar --discovery hr

To compare the databases, you’ll run something like this:

java -jar pgcompare.jar --batch=0

The summary output of the compare will appear at the end of the job:

Reconciliation Complete:  Table = emp; Equal = 21; Not Equal = 1; Missing Source = 1; Missing Target = 0
Processed 1 tables
Table Summary: Table = emp                           ; Status = out-of-sync ; Equal =                  21; Not Equal =                   1; Missing Source =                   1; Missing Target =                   0
Run Summary:  Elapsed Time (seconds) = 7; Total Rows Processed = 23; Total Out-of-Sync = 2; Through-put (rows/per second) = 3

Last, if there are out of sync rows, details on each row as well as a revalidation can be performed using the check option:

java -jar pgcompare.jar --batch=0 --check

The details on out of sync rows will appear as the check is performed:

Primary Key: {"eid":23}
  Out-of-Sync:  PK = {"eid": 23};  Differences = [{"LAST_NAME":{"source":"Runner","target":"Pace"}}]
Primary Key: {"eid":22}
  Out-of-Sync:  PK = {"eid": 22};  Differences = ["Missing Source"]

Use Cases for pgCompare

Data Migration

When migrating data from one database platform to another, ensuring that all records have been accurately transferred is critical. For example, the Crunchy database migration team uses this tool to validate data during Oracle to Postgres migrations. It is also useful to create a data validation artifact that verifies data consistency before decommissioning old systems.

Data Synchronization

For organizations that run multiple databases concurrently, maintaining synchronization between these systems is essential. The demand for active/active configuration continues to grow. These solutions use logical replication which introduces risk. To control this risk, introduce a compensating control, pgCompare helps in regularly checking and syncing data across different databases.

Regulatory Compliance

Many industries require regular audits to ensure data accuracy and compliance with regulations. pgCompare simplifies the auditing process by providing clear and detailed comparison reports. Auditors and regulators always require evidence that data divergence is not occurring. The output from pgCompare is useful in meeting this requirement.

Quality Assurance

In development and testing environments, pgCompare can be used to verify that data remains consistent across various stages of application development and deployment. If testing is performed against incorrect or outdated data, it could add risk for production releases.

Why pgCompare is a game changer

The traditional methods of data comparison often involve manual processes or scripts that are prone to errors and require significant maintenance. Many solutions rely on comparing row counts which does not prove the data is indeed equal. pgCompare revolutionizes this process by providing a reliable, automated solution that reduces the risk of errors and saves valuable time.

  • Efficiency: Automating data comparison reduces the time and effort required for manual checks, allowing database administrators and data engineers to focus on more strategic tasks.
  • Accuracy: By leveraging advanced algorithms, pgCompare ensures precise identification of discrepancies, enhancing data integrity.
  • Integration: With support for multiple databases, pgCompare seamlessly integrates into diverse IT environments, making it a versatile tool for any organization.

In a world where data accuracy and consistency are paramount, pgCompare offers a reliable, efficient, and scalable solution for comparing data across PostgreSQL, Oracle, MySQL, and MSSQL databases. Whether you're a database administrator, data engineer, or IT manager, pgCompare is the tool you need to ensure your data remains consistent and reliable.

Embrace the future of data comparison with pgCompare and transform the way you manage your multi-database environment.

Avatar for Brian Pace

Written by

Brian Pace

May 31, 2024 More by this author