Scheduling Backups En Masse with the Postgres Operator

Avatar for Jonathan S. Katz

Jonathan S. Katz

5 min read

An important part of running a production PostgreSQL database system (and for that matter, any database software) is to ensure you are prepared for disaster. There are many ways to go about preparing your system for disaster, but one of the simplest and most effective ways to do this is by taking periodic backups of your database clusters.

How does one typically go about setting up taking a periodic backup? If you’re running PostgreSQL on a Linux based system, the solution is to often use cron, and setting up a crontab entry similar to this in your superuser account:

# take a daily base backup at 1am to a mount point on an external disk
# using pg_basebackup
0 1 * * * /usr/bin/env pg_basebackup –D /your/external/mount/

However, if you’re managing tens, if not hundreds and thousands of PostgreSQL databases, this very quickly becomes an onerous task and you will need some automation to help you scale your disaster recovery safely and efficiently.

Automating Periodic Backups

The Crunchy PostgreSQL Operator, an application for managing PostgreSQL databases in a Kubernetes-based environment in is designed for managing thousands of PostgreSQL database from a single interface to help with challenges like the above. One of the key features of the PostgreSQL Operator is to utilize Kubernetes Labels to apply commands across many PostgreSQL databases. Later in this article, we will see how we can take advantage of labels in order to set backup policies across many clusters.

The PostgreSQL Operator 3.4.0 release introduced the ability to schedule tasks to run periodically at specified intervals. In this initial release, the PostgreSQL Operator allows you to schedule periodic backups of your database clusters using pgBackRest.

pgBackRest supports different types of backups, including “full backups” (a snapshot of the entire database) and “differential backups” (a snapshot of all the changes in the database since the last full backup). Given that a full backup for a large database can be, well, large, differential backups help to speed up the process of restoring a database during a recovery scenario. Thus, it is a good idea to take periodic full and differential backups of your database clusters, particularly if they are quite large.

For example, let’s say I have a database named “users” and my team has a policy where we need to take a nightly backup of our databases and a differential backup every six hours. Using the PostgreSQL Operator, I can run the following commands:

# schedule a nightly full backup of my database at midnight
pgo create schedule --schedule="0 0 * * *" --schedule-type=pgbackrest \
  --pgbackrest-backup-type=full users

# schedule a differential backup of my database at the 6 hour marks except midnight
pgo create schedule --schedule="0 6,12,18 * * *" --schedule-type=pgbackrest \
  --pgbackrest-backup-type=diff users

The scheduling syntax should look very familiar: it is the same exact syntax you would use when setting a periodic backup with cron! Underneath the surface, the PostgreSQL Operator manages the schedules using the crunchy-scheduler container that runs as a separate pod in the cluster.

Applying Backup Policies Across Many Clusters

You may look at the above and conclude it’s no different than managing cron and your own, and in my opinion you would be correct. Though creating a single schedule using the PostgreSQL Operator does encapsulate some of the pgBackRest arguments, it does create some more overhead and thus would be a bit more complicated to set up and manage than perhaps a single crontab.

But, what if I have to apply this backup policy across all of the hundreds production PostgreSQL clusters? Enter our Kubernetes Labels.

Let’s say I created a Kubernetes label called “env” that can hold one of either of three values: “dev” “test” and “prod” where each represents a development, testing, and production database respectively. If I wanted to apply the same backup policy above to all of my production databases, I could run the following commands:

# schedule a nightly full backup of my database at midnight
pgo create schedule --schedule="0 0 * * *" --schedule-type=pgbackrest \
  --pgbackrest-backup-type=full --selector=env=prod

# schedule a differential backup of my database at the 6 hour marks except midnight
pgo create schedule --schedule="0 6,12,18 * * *" --schedule-type=pgbackrest \
  --pgbackrest-backup-type=diff --selector=env=prod

And that’s it – every single one of my backups now has that policy. I can verify that the policy was set across all of my production databases by running the following command:

pgo show schedule --selector=env=prod

If I wanted to remove all the backup policies for my production PostgreSQL clusters, I could run the following command:

pgo delete schedule --selector=env=prod

If I wanted to take full backups for all of my databases once a week on Sunday, I could run the following command:

pgo create schedule --schedule="0 0 * * SUN" --schedule-type=pgbackrest \
  --pgbackrest-backup-type=full --selector=all

Neat!

Ongoing Work & Closing Thoughts

It is generally helpful when managing something as critical as your backups to have simple, easy-to-use interfaces, which holds doubly true when you have an emergency situation when you need to restore from your backups (as I can personally attest to). Even with automation, you should also periodically verify key pieces of your backup architecture, e.g. are your backups being saved, are the being saved to an appropriate offsite disk, can you restore from your backups, etc.

The upcoming Crunchy PostgreSQL Operator 3.5 release adds many improvements to its support for pgBackRest, as well as the scheduler supporting running jobs that execute SQL.

If you want to learn more about running PostgreSQL on Kubernetes, please visit https://www.crunchydata.com/products/crunchy-postgresql-for-kubernetes