Postgres 18 New Default for Data Checksums and How to Deal with Upgrades

Greg Sabino Mullane

4 min readMore by this author

In a recent Postgres patch authored by Greg Sabino Mullane, Postgres has a new step forward for data integrity: data checksums are now enabled by default.

This appears in the release notes as a fairly minor change but it significantly boosts the defense against one of the sneakiest problems in data management - silent data corruption.

Let’s dive into what this feature is, what the new default means for you, and how it impacts upgrades.

What is a data checksum?

A data checksum is a simple but powerful technique to verify the integrity of data pages stored on disk. It's like a digital fingerprint for every 8KB block of data (a "page") in your database.

  • Creation: When Postgres writes a data page (table and indexes) to disk, it runs an algorithm on the page's contents to calculate a derived, small value—the checksum.
  • Storage: This checksum is stored in the page header alongside the data.
  • Verification: Whenever Postgres reads that page back from disk, it immediately recalculates the checksum from the data and compares it to the stored value.

If the two values do not match, it means the data page has been altered or corrupted since it was last written. This is important because data corruption can happen silently. By detecting a mismatch, Postgres can immediately raise an error and alert you to a potential problem. Checksums are also an integral part of pgBackRest which uses these checksums to verify backups.

What is initdb and why does it matter?

The initdb command in Postgres is the utility used to create a new Postgres database cluster and initializes the data directory where Postgres stores all the permanent data. When you run initdb, it does things like:

  1. create the directory structure
  2. create the template databases like template1 and postgres
  3. populate the initial system catalog tables
  4. create the initial version of the server configuration files
  5. enable and start keeping track of checkums

The syntax often looks something like this:

/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data

As an end user who uses cloud managed Postgres or even a local tool like Postgres.app, you generally never see the initdb command because it is a one-time administrative setup task.

The new default --data-checksums for initdb

In the past database admins had to manually add the --data-checksums flag when running initdb to enable this feature. If you forgot or didn’t know about this feature, the new cluster was created without these built-in integrity checks.

The default behavior of initdb is now to enable data checksums every time Postgres is initiated.

  • old command - checksums OFF by default: initdb -D /data/pg14
  • new default command - checksums ON by default: initdb -D /data/pg18

This is generally a win for Postgres best practices. Every new database cluster is now automatically equipped with this corruption defense, requiring no extra effort.

--no-data-checksums

You might have a very specific reason to disable checksums and you can explicitly opt out using the new flag:

initdb --no-data-checksums -D /data/pg18

Checksums and pg_upgrade

While the new default is great, it may introduce a compatibility issue for those doing a major version upgrade using the pg_upgrade utility.

pg_upgrade works by connecting an old data directory to a new data directory and a fundamental requirement is that both clusters must have the same checksum setting—either both ON or both OFF.

If you are upgrading an older Postgres cluster that was created before this change, chances are it has checksums disabled and pg_upgrade will fail because the settings mismatch.

In an upgrade pinch, to upgrade a non-checksum-enabled cluster, you can use the new --no-data-checksums flag when initializing the new cluster to make the settings align.

Upgrading an existing Postgres database to checksums

Instead of continuing forever with no data checksums, the better long term solution is to add checksums to your database before the next upgrade. Sadly, there’s really no way to do this without some downtime and a restart. Adding checksums to an existing database can be a slow process with a large database. There’s a pg_checksums utility to help with this which is well documented.

We have helped a few folks with this issue. For larger no-downtime environments, you can add the checkums on a replica machine and then fail over to that.

Summary

Postgres checksums are a great feature - and will be the default in the future. If you haven’t used checksums in the past, you may want to start planning now for adding them, especially since a self managed major version upgrade will require a bit of extra thinking.