Introduction to Postgres Backups
Backups in the database world are essential. They are the safety net protecting you from even the smallest bit of data loss. There’s a variety of ways to back up your data and this post aims to explain the basic tools involved in backups and what options you have, from just getting started to more sophisticated production systems.
pg_dumpall are tools designed to generate a file and then allow
a database to be restored. These are classified as logical backups and they can
be much smaller in size than physical backups. This is due, in part, to the fact
that indexes are not stored in the SQL dump. Only the
CREATE INDEX command is
stored and indexes must be rebuilt when restoring from a logical backup.
One advantage of the SQL dump approach is that the output can generally be reloaded into newer versions of Postgres so dump and restores are very popular for version upgrades and migrations. Another advantage is that these tools can be configured to back up specific database objects and ignore others. This is helpful, for example, if only a certain subset of tables need to be brought up in a test environment. Or you want to back up a single table as you do some risky work.
Postgres dumps are also internally consistent, which means the dump represents a snapshot of the database at the time the process started. Dumps will usually not block other operations, but they can be long-running (i.e. several hours or days, depending on hardware and database size). Because of the method Postgres uses to implement concurrency, known as Multiversion Concurrency Control, long running backups may cause Postgres to experience performance degradation until the dump completes.
To dump a single database table you can run something like:
pg_dump -t my_table > table.sql
To restore it, run something like:
psql -f table.sql
pg_dump as a corruption check
pg_dump sequentially scans through the entire data set as it creates the file.
Reading the entire database is a rudimentary corruption check for all the table
data, but not for indexes. If your data is corrupted, pg_dump will throw an
exception. Crunchy generally recommends using the
amcheck module to do a
corruption check, especially during some kind of upgrade or migration where
might be involved.
Server & file system backups
If you’re coming from the Linux admin world, you’re used to backup options for
the entire machine your database runs on, using
rsync or another tool.
Postgres cannot safely backup using file-oriented tools while it’s running, and
there’s not a simple way to quiesce writes either. To get the database into a
state where you can
rsync the data, you either have to shut it down or go
through all the work of setting up change archiving. There are also some other
options for storage layers that support snapshots for the entire data
directory - but read the fine print on these.
Physical Backups & WAL archiving
Beyond basic dump files, the more sophisticated methods of Postgres backup all
depend on saving the database’s Write-Ahead-Log (WAL) files. WAL tracks changes
to all the database blocks, saving them into segments that default to 16MB in
size. The continuous set of a server’s WAL files are referred to as its WAL
stream. You have to start archiving the WAL stream’s files before you can safely
copy the database, followed by a procedure that produces a “Base Backup”, i.e.
pg_basebackup. The incremental aspect of WAL makes possible a series of other
restoration features lumped under the banner of
Point In Time Recovery
Create a basebackup with pg_basebackup:
You can use something like this:
$ sudo -u postgres pg_basebackup -h localhost -p 5432 -U postgres \ -D /var/lib/pgsql/15/backups -Ft -z -Xs -P -c fast
A few comments on the command above.
- This command should be run as the
-Dparameter specifies where to save the backup.
-Ftparameter indicates the tar format should be used.
-Xsparameter indicates that WAL files will stream to the backup. This is important because substantial WAL activity could occur while the backup is taken and you may not want to retain those files in the primary during this period. This is the default behavior, but worth pointing out.
-zparameter indicates that tar files will be compressed.
-Pparameter indicates that progress information is written to stdout during the process.
-cfast parameter indicates that a checkpoint is taken immediately. If this parameter is not specified, then the backup will not begin until Postgres issues a checkpoint on its own, and this could take a significant amount of time.
Once the command is entered, the backup should begin immediately. Depending upon the size of the cluster, it may take some time to finish. However, it will not interrupt any other connections to the database.
Steps to restore from a backup taken with pg_basebackup
They are simplified from the official documentation. If you are using some features like tablespaces you will need to modify these steps for your environment.
- Ensure the database is shutdown.
$ sudo systemctl stop postgresql-15.service $ sudo systemctl status postgresql-15.service
- Remove the contents of the Postgres data directory to simulate the disaster.
$ sudo rm -rf /var/lib/pgsql/15/data/*
- Extract base.tar.gz into the data directory.
$ sudo -u postgres ls -l /var/lib/pgsql/15/backups total 29016 -rw-------. 1 postgres postgres 182000 Nov 23 21:09 backup_manifest -rw-------. 1 postgres postgres 29503703 Nov 23 21:09 base.tar.gz -rw-------. 1 postgres postgres 17730 Nov 23 21:09 pg_wal.tar.gz $ sudo -u postgres tar -xvf /var/lib/pgsql/15/backups/base.tar.gz \ -C /var/lib/pgsql/15/data
- Extract pg_wal.tar.gz into a new directory outside the data directory. In our case, we create a directory called pg_wal inside our backups directory.
$ sudo -u postgres ls -l /var/lib/pgsql/15/backups total 29016 -rw-------. 1 postgres postgres 182000 Nov 23 21:09 backup_manifest -rw-------. 1 postgres postgres 29503703 Nov 23 21:09 base.tar.gz -rw-------. 1 postgres postgres 17730 Nov 23 21:09 pg_wal.tar.gz $ sudo -u postgres mkdir -p /var/lib/pgsql/15/backups/pg_wal $ sudo -u postgres tar -xvf /var/lib/pgsql/15/backups/pg_wal.tar.gz \ -C /var/lib/pgsql/15/backups/pg_wal/
- Create the recovery.signal file.
$ sudo -u postgres touch /var/lib/pgsql/15/data/recovery.signal
- Set the restore_command in postgresql.conf to copy the WAL files streamed during the backup.
$ echo "restore_command = 'cp /var/lib/pgsql/15/backups/pg_wal/%f %p'" | \ sudo tee -a /var/lib/pgsql/15/data/postgresql.conf
- Start the database.
$ sudo systemctl start postgresql-15.service sudo systemctl status postgresql-15.service
- Now your database is up and running based on the information contained in the previous basebackup.
Automating physical backups
Building upon the
pg_basebackup, you could write a series of scripts to use
this backup, add WAL segments to it, and manage a complete physical backup
scenario. There are several tools out there including WAL-E, WAL-G, and
pgBackRest that will do all this for you. WAL-G is the next generation of WAL-E
and works for quite a few other databases including MySQL and Microsoft SQL
Server. WAL-G is also used extensively at the enterprise level with some large
Postgres environments, including Heroku. When we first built Crunchy Bridge, we
had a choice between WAL-G and pgBackRest since we employ the maintainers of
both and each has its perks. In the end, we selected pgBackRest.
pgBackRest is the best in class backup tool out there. There are a number of very large Postgres environments relying on pgBackRest, including our own Crunchy Bridge, Crunchy for Kubernetes, and Crunchy Postgres as well as countless other projects in the Postgres ecosystem.
pgBackRest can perform three types of backups:
- Full backups - these copy the entire contents of the database cluster to the backup.
- Differential backups - this copies only the database cluster files that have changed since the last full backup
- Incremental backups - which copy only the database cluster files that have changed since the last full, differential, or incremental.
pgBackRest has some special features like:
- Allowing you to go back to a Point in Time - PITR (Point-in-Time Recovery)
- Creating a Delta Restore which will use database files already present and updated based on WAL segments. This makes potential restores much faster, especially if you have a large database and don’t want to restore the entire thing.
- Letting you have multiple backup repositories - say one local or one remote for redundancy.
Concerning archiving, users can set the
archive_command parameter to use
pgBackRest to copy WAL files to an external archive. These files could be
retained indefinitely or expired in accordance with your organization's data
To start pgBackRest after installation, you’ll run something like this:
$ sudo -u postgres pgbackrest --stanza=demo --log-level-console=info stanza-create
To do a delta restore:
$ sudo systemctl stop postgresql-15.service $ sudo -u postgres pgbackrest \ --stanza=db --delta \ --type=time "--target=2022-09-01 00:00:05.010329+00" \ --target-action=promote restore
When the restore completes, you restart the database and verify that the users table is back.
$ sudo systemctl start postgresql-15.service $ sudo -u postgres psql -c "select * from users limit 1"
pgBackRest has pretty extensive settings and configurations to set up a strategy specific to your needs. Your backup strategy will depend on several factors, including the recovery point objective, available storage, and other factors. The right solution will vary based on these requirements. Finding the right strategy for your use case is a matter of striking a balance between the time to restore, the storage used, IO overhead on the source database, and other factors.
Our usual recommendation is to combine the backup and WAL archival capabilities of pgBackRest. We usually recommend customers take a weekly full base backup in addition to their continuous archiving of WAL files, and consider if other incremental backup forms--maybe even pg_dump--make sense for your requirements.
Choosing the backup tool for your use case will be a personal choice based on your needs, tolerance for recovery time, and available storage. In general, it is best to think of pg_dump is as a utility for doing specific database tasks. pg_basebackup can be an option if you’re ok with single physical backups on a specific time basis. If you have a production system of size and need to create a disaster recovery scenario, it's best to implement pgBackRest or a more sophisticated tool using WAL segments on top of a base backup. Of course, there’s fully managed options out there like Crunchy Bridge which will handle all this for you.
Co-authored with Elizabeth Christensen
March 9, 2023 •More by this author