David Steele
5 min read
Latest Articles
- Accessing Large Language Models from PostgreSQL
- 8 Steps in Writing Analytical SQL Queries
- 4 Ways to Create Date Bins in Postgres: interval, date_trunc, extract, and to_char
- Crunchy Postgres for Kubernetes 5.7: Faster Backups, Automated Snapshots, Postgres 17 and More
- pg_parquet: An Extension to Connect Postgres and Parquet
Introducing pgBackRest Multiple Repository Support
The pgBackRest team is pleased to announce the introduction of multiple repository support in v2.33. Backups already provide redundancy by creating an offline copy of your PostgreSQL cluster that can be used in disaster recovery. Multiple repositories allow you to have copies of your backups and WAL archives in separate locations to increase your redundancy and provide even more protection for your data. This feature is the culmination of many months of hard work, so let's delve into why we think multiple repositories are so important and how they can help preserve your data.
If you are unfamiliar with pgBackRest, general repository configuration, or configuring PostgreSQL to work with pgBackRest, please read the pgBackRest Quick Start before proceeding.
Configuration
Up to four repositories may be configured and each one can be any repo type, e.g. S3 or Posix. The configuration below defines two repositories. repo1
is Posix and stored on locally-mounted NFS volume; the repository should always be located off the PostgreSQL server in case disaster strikes. repo2
is stored on Azure.
$ cat /etc/pgbackrest/pgbackrest.conf
[demo]
pg1-path=/var/lib/postgresql/13/demo
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo2-type=azure
repo2-azure-account=pgbackrest
repo2-azure-container=demo-container
repo2-azure-key=YXpLZXk=
repo2-path=/demo-repo
repo2-retention-full=8
Note that the retention has been configured differently on each repository. repo1
has a shorter retention to save space but still provide several backups and plenty of WAL on local storage that is both fast and cheap (in terms of bandwidth cost) to access. repo2
has a longer retention and is stored in Azure where storage is cheap but retrieving data is both slower and more expensive than repo1
. repo2
is both a fail-safe in case something goes wrong with repo1
and a resource to restore backups from remote sites.
pgBackRest will treat repo1
with higher priority than repo2
for certain commands like restore and archive-get. In general, the lower-numbered repositories should be faster and/or cheaper than the higher-numbered repositories.
Usage
Now that the repositories are configured, run stanza-create
. This will initialize the demo
stanza in each repository.
$ pgbackrest --stanza=demo stanza-create
<...>
INFO: stanza-create for stanza 'demo' on repo1
INFO: stanza-create for stanza 'demo' on repo2
<...>
Once the repositories have been initialized it is a good idea to run check
to ensure that everything is working. Note that WAL segments are being pushed to both repositories.
$ pgbackrest --stanza=demo check
<...>
INFO: check repo1 configuration (primary)
INFO: check repo2 configuration (primary)
INFO: check repo1 archive for WAL (primary)
INFO: WAL segment 000000010000000000000003 successfully archived to '/var/lib/pgbackrest/archive/demo/13-1/0000000100000000/000000010000000000000003-c981b4ddc8c1437c539eda05427a6aa454a0923e.gz' on repo1
INFO: check repo2 archive for WAL (primary)
INFO: WAL segment 000000010000000000000003 successfully archived to '/demo-repo/archive/demo/13-1/0000000100000000/00000001000000000000000
Now a backup should be run for each repository. Most commands operate automatically on all repos but backup requires the repo to be specified. Each repo is likely to have different retention and backup schedules so backups should be run independently.
pgbackrest --stanza=demo --repo=1 backup
pgbackrest --stanza=demo --repo=2 backup
Info for the repositories is shown in a unified fashion with the organizing unit being the stanza. This way all the backups for a stanza can be seen together and the most recent backup easily identified since the list is sorted oldest to newest.
$ pgbackrest info
stanza: demo
status: ok
cipher: none
db (current)
wal archive min/max (13): 000000010000000000000003/000000010000000000000006
full backup: 20210331-155726F
timestamp start/stop: 2021-03-31 15:57:26 / 2021-03-31 15:57:30
wal start/stop: 000000010000000000000005 / 000000010000000000000005
database size: 23.1MB, database backup size: 23.1MB
repo1: backup set size: 2.8MB, backup size: 2.8MB
full backup: 20210331-155736F
timestamp start/stop: 2021-03-31 15:57:36 / 2021-03-31 15:57:41
wal start/stop: 000000010000000000000006 / 000000010000000000000006
database size: 23.1MB, database backup size: 23.1MB
repo2: backup set size: 2.8MB, backup size: 2.8MB
It is also possible to get info for a single repository by specifying the --repo
option.
$ pgbackrest --repo=2 info
stanza: demo
status: ok
cipher: none
db (current)
wal archive min/max (13): 000000010000000000000003/000000010000000000000006
full backup: 20210331-155736F
timestamp start/stop: 2021-03-31 15:57:36 / 2021-03-31 15:57:41
wal start/stop: 000000010000000000000006 / 000000010000000000000006
database size: 23.1MB, database backup size: 23.1MB
repo2: backup set size: 2.8MB, backup size: 2.8MB
When restoring, pgBackRest will automatically select the best backup from the repositories based on your criteria. Here is an example of time-based recovery:
pgbackrest --stanza=demo --delta --type=time --target="2021-03-31 15:57:31-04" restore
<...>
INFO: repo1: restore backup set 20210331-155726F
<...>
In this case only the backup from repo1
was valid because of the time restriction. Even if a later time was specified that would seem to favor the later backup in repo2
, the same backup from repo1
will be selected. This is because pgBackRest tends to prefer the repo with higher priority, i.e. repo1 over repo2, under the assumption that it will be faster and/or cheaper than a repository with lower priority.
If you want to restore from a specific repository then simply specify the preferred repository using --repo
. The archive-get
command generated for recovery will always search all repositories in priority order for WAL segments.
Conclusion
Multiple repositories allow for more redundancy as well as cost savings and performance improvements by allowing a repository to be located close to the PostgreSQL clusters while also having a repository located safely out in the cloud or in another data center.
For more information about multiple repository support, see the User Guide.
Related Articles
- Accessing Large Language Models from PostgreSQL
5 min read
- 8 Steps in Writing Analytical SQL Queries
8 min read
- 4 Ways to Create Date Bins in Postgres: interval, date_trunc, extract, and to_char
8 min read
- Crunchy Postgres for Kubernetes 5.7: Faster Backups, Automated Snapshots, Postgres 17 and More
4 min read
- pg_parquet: An Extension to Connect Postgres and Parquet
4 min read