pgBackRest Point-In-Time Recovery Using Crunchy PostgreSQL Operator

Chirag Dave

7 min readMore by this author

The Crunchy PostgreSQL Operator is a controller that runs within a Kubernetes cluster that provides a means to deploy and manage PostgreSQL clusters.

pgBackRest is one of the most simple yet powerful backup and restore tools, with a bunch of rich features especially for backing up large databases.

At Crunchy Data we have made significant improvements in the PostgreSQL Operator functionality especially for integrating pgBackRest support. This blog post will demonstrate how pgBackRest is well integrated into the PostgreSQL Operator and can be used to backup and restore a database to any point-in-time, a technique that is known as "PITR."

Please follow this link to install Crunchy PostgreSQL Operator. In my setup I am running version 3.5.2 in a local OpenShift environment, but provide examples that work in any Kubernetes-based environment.

$ pgo version
pgo client version 3.5.2
pgo-apiserver version 3.5.2

1. Create a cluster with pgBackRest

$ pgo create cluster mycluster --pgbackrest
created Pgcluster mycluster
workflow id 34c1d770-9ea6-4c9c-974c-113c8e2a45db

If you notice, after the command ran, it gives you a workflow id and with that you can now track progress and it is very useful if you are trying to automate to deploy a PostgreSQL cluster.

2. pgo show workflow

34c1d770-9ea6-4c9c-974c-113c8e2a45db
parameter value
--------- -----
workflowid 34c1d770-9ea6-4c9c-974c-113c8e2a45db
pg-cluster mycluster
task completed 2019-03-28.17.24.50
task submitted 2019-03-28.17.24.07

Here you can see the time at which a task was submitted and completed etc.

3. Show cluster all

$ pgo show cluster all
cluster : mycluster (crunchy-postgres:centos7-11.2-2.3.1)
pod : mycluster-6d6974c44d-jbslq (Running) on openshift-demo-app-1 (1/1) (primary)
pvc : mycluster
resources : CPU Limit= Memory Limit=, CPU Request= Memory Request=
storage : Primary=2G Replica=2G
deployment : mycluster
deployment : mycluster-backrest-shared-repo
service : mycluster - ClusterIP (172.30.13.79)
labels : pgo-backrest=true pgo-version=3.5.2 archive-timeout=60 current-primary=mycluster deployment-name=mycluster pg-cluster=mycluster primary=true archive=true crunchy-pgbadger=false crunchy_collect=false name=mycluster

(Note: In PostgreSQL Operator 4.0, this command is pgo show cluster --all).

4. Take a pgBackRest backup

pgBackRest by default will take full backup the first time you run it:

$ pgo backup mycluster --backup-type=pgbackrest
created Pgtask backrest-backup-mycluster

$ pgo show backup mycluster --backup-type=pgbackrest
backrest : mycluster
stanza: db
status: ok
cipher: none
db (current)
wal archive min/max (11-1): none present
full backup: 20190328-172543F
timestamp start/stop: 2019-03-28 17:25:43 / 2019-03-28 17:26:02
wal start/stop: 000000010000000000000002 / 000000010000000000000002
database size: 30MB, backup size: 30MB
repository size: 3.6MB, repository backup size: 3.6MB

$ kubectl get pod
NAME READY STATUS RESTARTS AGE
backrest-backup-mycluster-6rqsw 0/1 Completed 0 56s
mycluster-6d6974c44d-jbslq 1/1 Running 0 2m
mycluster-backrest-shared-repo-9d88cf646-fjtkx 1/1 Running 0 2m
mycluster-stanza-create-wsj7m 0/1 Completed 0 1m
postgres-operator-5876b4b65d-9p5sl 3/3 Running 0 25m

(If you are on OpenShift, you can use oc get pod)

Next, let’s create a scenario where you have an important table and it is being written and gets dropped accidentally. After the table is dropped I will be creating a new table and insert some data into it.

kubectl exec -it mycluster-6d6974c44d-jbslq
sh-4.2$ psql
psql (11.2)
Type "help" for help.

postgres=# begin ;
BEGIN
postgres=# create table important_table(id int );
CREATE TABLE
postgres=#
postgres=# insert into important_table values (10);
INSERT 0 1
postgres=# commit ;
COMMIT
postgres=# select now();
now
-------------------------------
2019-03-28 17:27:54.330591+00
(1 row)

postgres=# drop table important_table ;
DROP TABLE
postgres=# CREATE TABLE random_table AS SELECT p, md5(random()::text) FROM generate_Series(1,100000)p;
SELECT 100000

(In OpenShift you can open up a remote shell like the above using oc rsh mycluster-6d6974c44d-jbslq)

Now let’s go ahead and try to recover to PITR target, that is right before the table got dropped. We expect to have important_table back and random_table which was created after that timestamp NOT be recovered.

$ pgo restore mycluster --pitr-target="2019-03-28 17:27:54.330591+00" --backup-opts="--type=time --target-action=promote">Warning: If currently running, the primary database in this cluster will be stopped and recreated as part of this workflow!
WARNING: Are you sure? (yes/no): yes
restore performed on mycluster to mycluster-prdg opts=--type=time --target-action=promote pitr-target=2019-03-28 17:27:54.330591+00
workflow id c11eae16-96cc-4a7d-b354-bda071ad07bb

New restore container is created

$ kubectl get pod
NAME READY STATUS RESTARTS AGE
backrest-backup-mycluster-6rqsw 0/1 Completed 0 4m
mycluster-backrest-shared-repo-568fb89978-xqw6r 1/1 Running 0 41s
mycluster-stanza-create-wsj7m 0/1 Completed 0 5m
postgres-operator-5876b4b65d-9p5sl 3/3 Running 0 28m
restore-mycluster-kutr-7s949 0/1 ContainerCreating 0 11s

kubectl log restore-mycluster-kkcv-q89ll

.. some lines are truncated.

PITR_TARGET is not empty [2019-03-28 17:27:54.330591+00]
+ '[' '2019-03-28 17:27:54.330591+00' = '' ']'
+ echo PITR_TARGET is not empty '[2019-03-28' '17:27:54.330591+00]'
+ pgbackrest restore --type=time --target-action=promote '--target=2019-03-28 17:27:54.330591+00'

After restore is complete.

$ kubectl get pod
NAME READY STATUS RESTARTS AGE
backrest-backup-mycluster-6rqsw 0/1 Completed 0 5m
mycluster-backrest-shared-repo-568fb89978-xqw6r 1/1 Running 0 1m
mycluster-prdg-69559bb7b-mm47x 1/1 Running 0 28s
mycluster-stanza-create-wsj7m 0/1 Completed 0 6m
postgres-operator-5876b4b65d-9p5sl 3/3 Running 0 29m
restore-mycluster-kutr-7s949 0/1 Completed 0 1m

Now the moment of truth, are we able to recover the important_table?

kubectl exec -it mycluster-prdg-69559bb7b-mm47x -- /bin/bash
sh-4.2$ psql
psql (11.2)
Type "help" for help.

postgres=# \dt
List of relations
 Schema | Name | Type | Owner
--------+-----------------+-------+----------
 public | important_table | table | postgres
 public | primarytable | table | postgres
(2 rows)

postgres=# select * from important_table ;
id
----
10
(1 row)

It is important to understand the timestamp of backup especially when you are trying point-in-time recovery. If you planning to restore at time "T0" it is important to use backup taken before time "T0". In the next example, let's take another backup, in this case Incremental. This is to demonstrate a scenario where you are trying to restore backup which was taken after your intended recovery target point.

(pgBackRest supports not only full but also incremental and differential backup).

Before backup, let’s first drop important_table that was created.

postgres=# drop table important_table ;
DROP TABLE

Followed by another backup

pgo backup mycluster --backup-type=pgbackrest --backup-opts="--type=incr"
created Pgtask backrest-backup-mycluster

pgo show backup mycluster --backup-type=pgbackrest

backrest : mycluster
stanza: db
status: ok
cipher: none

db (current)
wal archive min/max (11-1): none present

full backup: 20190328-172543F
timestamp start/stop: 2019-03-28 17:25:43 / 2019-03-28 17:26:02
wal start/stop: 000000010000000000000002 / 000000010000000000000002
database size: 30MB, backup size: 30MB
repository size: 3.6MB, repository backup size: 3.6MB

incr backup: 20190328-172543F_20190328-173255I
timestamp start/stop: 2019-03-28 17:32:55 / 2019-03-28 17:33:05
wal start/stop: 000000020000000000000004 / 000000020000000000000004
database size: 30MB, backup size: 2.7MB
repository size: 3.6MB, repository backup size: 304.8KB
backup reference list: 20190328-172543F

Let’s try to restore again, but this time, by default it will use last backup which was taken after important_table was dropped.

$ pgo restore mycluster --pitr-target="2019-03-28 17:27:54.330591+00" --backup-opts="--type=time --target-action=promote"
Warning: If currently running, the primary database in this cluster will be stopped and recreated as part of this workflow!
WARNING: Are you sure? (yes/no): yes
restore performed on mycluster to mycluster-imyr opts=--type=time --target-action=promote pitr-target=2019-03-28 17:27:54.330591+00
workflow id 55332892-cb60-436a-876d-7ebb740f77d7

This time after the restore.

$ kubectl get pod
NAME READY STATUS RESTARTS AGE
backrest-backup-mycluster-2shf6 0/1 Completed 0 2m
mycluster-backrest-shared-repo-554ccfb84d-pt4b6 1/1 Running 0 1m
mycluster-imyr-78cf699566-ms24l 1/1 Running 0 24s
postgres-operator-5876b4b65d-9p5sl 3/3 Running 0 34m
restore-mycluster-kutr-7s949 0/1 Completed 0 5m
restore-mycluster-wclg-4mwmt 0/1 Completed 0 1m

$ kubectl exec -it mycluster-imyr-78cf699566-ms24l -- /bin/bash
postgres=# \dt
List of relations
Schema | Name | Type | Owner
--------+--------------+-------+----------
public | primarytable | table | postgres
(1 row)

Now let’s use proper backup to have the ability to restore to point in time.

$ pgo restore mycluster --pitr-target="2019-03-28 17:27:54.330591+00" --backup-opts="--type=time --target-action=promote --set=20190328-172543F"
Warning: If currently running, the primary database in this cluster will be stopped and recreated as part of this workflow!
WARNING: Are you sure? (yes/no): yes
restore performed on mycluster to mycluster-oojw opts=--type=time --target-action=promote --set=20190328-172543F pitr-target=2019-03-28 17:27:54.330591+00
workflow id 29f47ec1-f49c-4248-ba1e-73f557974de0

Now again you have your important_table

kubectl exec -it mycluster-oojw-7bd8897b98-fhpqb -- /bin/bash
sh-4.2$ psql
psql (11.2)
Type "help" for help.

postgres=# \dt
List of relations
 Schema | Name | Type | Owner
--------+-----------------+-------+----------
 public | important_table | table | postgres
 public | primarytable | table | postgres
(2 rows)

postgres=# select * from important_table ;
id
----
10
(1 row)

One last thing, I would like to show is how you handle retention. Detailed retention options with pgBackRest can be found here.

$ pgo backup mycluster --backup-type=pgbackrest --backup-opts="--type=full --repo1-retention-full=1"
created Pgtask backrest-backup-mycluster

$ pgo show backup mycluster --backup-type=pgbackrest

backrest : mycluster
stanza: db
status: ok
cipher: none

db (current)
wal archive min/max (11-1): none present

full backup: 20190328-174140F
timestamp start/stop: 2019-03-28 17:41:40 / 2019-03-28 17:42:01
wal start/stop: 000000040000000000000004 / 000000040000000000000004
database size: 30MB, backup size: 30MB
repository size: 3.6MB, repository backup size: 3.6MB

So there you have it, the best of both worlds. Crunchy PostgreSQL Operator providing full support of pgBackRest to have the ability to archive transaction log and the ability to do backup and restore.