Advanced PostgreSQL high-availability and disaster recovery strategies designed to maximize uptime involve distributing database clusters across multiple data centers. While on the surface, this may seem intuitive (e.g. eliminate outages due to a single-point-of-failure), there are many nuances to consider, including avoiding the dreaded split-brain problem.
To set up a PostgreSQL deployment that spans across data centers (or regions, or Kubernetes clusters), and to mitigate the risk of a split-brain scenario, you need to ensure that each Postgres instance can properly communicate with each other. In Kubernetes, one way to do this is with a technique called “federation”. Federated Kubernetes clusters are able to communicate with each other, coordinate changes, and provide resiliency for applications that have high uptime requirements.
As of this writing, federation in Kubernetes is still in ongoing development area. While Kubernetes federation continues to mature, we wanted to provide a way to deploy PostgreSQL clusters managed by the open source PostgreSQL Operator that can span multiple Kubernetes clusters. This can be accomplished with a few environmental setups:
- Two Kubernetes clusters
- S3, or an external storage system that uses the S3 protocol
At a high-level, the PostgreSQL Operator follows the “active-standby” data center deployment model for managing the PostgreSQL clusters across Kubernetes clusters. In one Kubernetes cluster, the PostgreSQL Operator deploy PostgreSQL as an “active” PostgreSQL cluster, which means it has one primary and one-or-more replicas. In another Kubernetes cluster, the PostgreSQL cluster is deployed as a “standby” cluster: every PostgreSQL instance is a replica.
A side-effect of this is that in each of the Kubernetes clusters, the PostgreSQL Operator can be used to deploy both active and standby PostgreSQL clusters, allowing you to mix and match! While the mixing and matching may not ideal for how you deploy your PostgreSQL clusters, it does allow you to perform online moves of your PostgreSQL data to different Kubernetes clusters as well as manual online upgrades.
Lastly, while this feature does extend high-availability, promoting a standby cluster to an active cluster is not automatic. While the PostgreSQL clusters within a Kubernetes cluster do support self-managed high-availability, a cross-cluster deployment requires someone to specifically promote the cluster from standby to active.
So, presuming you've deployed the Postgres Operator to two different Kubernetes clusters, how do you create a multi-Kubernetes cluster PostgreSQL deployment? (You can try this example out in a single Kubernetes cluster, too!)
While one of the main goals of the PostgreSQL Operator is to make complex database management tasks easier, there are certain things that are more challenging to abstract away. That being said, while we believe this method for creating an active-standby multi-cluster PostgreSQL deployment (say that ten times fast!) is easier than doing it manually, it still takes some effort.
First, let's create the active PostgreSQL cluster. We are going to create a cluster that has two replicas and pushes it backups and archives to S3 and also deploys pgBouncer. We can accomplish this with the command below:
pgo create cluster hippo --pgbouncer --replica-count=2 \
Notice that we include an explicit password for the PostgreSQL superuser, replication user, and for the standard PostgreSQL user. Please take a note of these: in particular, the superuser and replication user credentials are used by the standby PostgreSQL cluster to not only configure itself properly, but to also be seamlessly available in the event that it needs to be promoted.
(If you are trying out this example in a single Kubernetes cluster, you can also
--secret-from flag to bootstrap the credentials).
Now let's use the Postgres Operator to create the standby cluster. Once the active cluster is ready, you can run the commands below to bootstrap your standby cluster:
pgo create cluster hippo-standby --standby --pgbouncer --replica-count=2 \
Note the use of the
--pgbackrest-repo-path flag as it points to the name of
the pgBackRest repository that is used for the original
What is happening here? Looking at the architecture diagram above, we create a PostgreSQL "leader" that is reading in changes that are pushed to the pgBackRest repository that is out in S3. These changes are then sent to other PostgreSQL replicas. This is known as cascading replication. replicas are cascading replicas, i.e. replicas replicating from a database server that itself is replicating from another database server.
Give the standby cluster a few moments to bootstrap, when it is ready, you will be able to see activity similar to this in the lead PostgreSQL instance:
kubectl logs hippo-standby-dcff544d6-s6d58
Thu Mar 19 18:16:54 UTC 2020 INFO: Node standby-dcff544d6-s6d58 fully initialized for cluster standby and is ready for use
2020-03-19 18:17:03,390 INFO: Lock owner: standby-dcff544d6-s6d58; I am standby-dcff544d6-s6d58
2020-03-19 18:17:03,454 INFO: Lock owner: standby-dcff544d6-s6d58; I am standby-dcff544d6-s6d58
2020-03-19 18:17:03,598 INFO: no action. i am the standby leader with the lock
2020-03-19 18:17:13,389 INFO: Lock owner: standby-dcff544d6-s6d58; I am standby-dcff544d6-s6d58
2020-03-19 18:17:13,466 INFO: no action. i am the standby leader with the lock
Because standby clusters are effectively read-only, certain functionality that
involves making changes to a database (e.g user modifications) is blocked while
a cluster is in standby mode. Additionally, backups and restores are blocked as
well. While pgBackRest does support backups from
standbys, this requires direct access to the primary database, which cannot be
done until the PostgreSQL Operator supports Kubernetes federation. If a blocked
function is called on a standby cluster via the
or a direct call to the API server, the call will return an error.
At some point, and this day may never come, you may have to promote the standby cluster (now, if you're doing this as part of a cluster migration, you certainly will want to do this, but hopefully never as part of a disaster recovery :-) So, how do you promote a standby cluster with the Postgres Operator?
The mechanics of promotion are much simpler than setting up a standby cluster, but it's important that you follow the order of operations, otherwise you may inadvertently create a split-brain situation for your cluster.
The most important part of promoting a standby cluster is not the promotion, but ensuring the old active cluster is disabled (aka shutdown -- this is a type of fencing). If the entire Kubernetes cluster that houses the active PostgreSQL cluster is down, this is in part made for you, but you will still want to ensure that when the old active cluster comes back online, it does not try to become a primary.
To start the promotion process, If you can access your primary cluster, issue a "shutdown" command like so:
pgo update cluster hippo --shutdown
The net effect of this is that all the Kubernetes Deployments that the PostgreSQL Operator manages will now manage 0 Pods:
$ kubectl get deployments --selector pg-cluster=hippo
NAME READY UP-TO-DATE AVAILABLE AGE
hippo 0/0 0 0 32m
hippo-backrest-shared-repo 0/0 0 0 32m
hippo-kvfo 0/0 0 0 27m
hippo-lkge 0/0 0 0 27m
hippo-pgbouncer 0/0 0 0 31m
Once the old active cluster is shut down, you can safely promote the standby cluster with the following command:
pgo update cluster hippo-standby --promote-standby
Check out the logs on the PostgreSQL leader in the standby cluster, you can see the promotion take place:
$ kubectl logs hippo-standby-dcff544d6-s6d58
2020-03-19 18:28:11,919 INFO: Reloading PostgreSQL configuration.
2020-03-19 18:28:16,792 INFO: Lock owner: standby-dcff544d6-s6d58; I am standby-dcff544d6-s6d58
2020-03-19 18:28:16,850 INFO: Reaped pid=5377, exit status=0
2020-03-19 18:28:17,024 INFO: no action. i am the leader with the lock
2020-03-19 18:28:26,792 INFO: Lock owner: standby-dcff544d6-s6d58; I am standby-dcff544d6-s6d58
2020-03-19 18:28:26,924 INFO: no action. i am the leader with the lock
At this point, the standby cluster becomes the new active cluster! If you are using pgBouncer (which we are in this example), you will need to rotate its password in order for it to continue to correctly operate. You can do so with the following command:
pgo update pgbouncer --rotate-password hippo-standby
And that's it! If you are migrating from one PostgreSQL cluster to another, you should be done at this point, though you may want to delete the old PostgreSQL cluster.
However, what if you want you create a new standby cluster?
Your failed Kubernetes cluster has come back online, but you have a newly promoted active PostgreSQL cluster in your other data center. Now what?
Let's turn the old active PostgreSQL cluster into a standby!
First, you will need to recreate all of the PVCs and reinitialize it as a standby using the S3 repository. Fortunately, the PostgreSQL Operator will do this for you with a command. Being that this is a destructive action (i.e. data will only be retained if any Storage Classes and/or Persistent Volumes have the appropriate reclaim policy configured) a warning is shown when attempting to enable standby.
$ pgo update cluster hippo --enable-standby
Enabling standby mode will result in the deletion of all PVCs for this cluster!
Data will only be retained if the proper retention policy is configured for any associated storage classes and/or persistent volumes.
Please proceed with caution.
WARNING: Are you sure? (yes/no): yes
updated pgcluster hippo
This will take a few moments to set up. To verify that the standby was created, you can check the information in the Kubernetes DCS:
kubectl get cm hippo-config -o yaml | grep standby
and ensure that the new PVCs were created (which you can tell by their age):
$ kubectl get pvc --selector pg-cluster=hippo
NAME STATUS VOLUME CAPACITY AGE
hippo Bound crunchy-pv251 1Gi 33s
hippo-kvfo Bound crunchy-pv174 1Gi 29s
hippo-lkge Bound crunchy-pv228 1Gi 26s
hippo-pgbr-repo Bound crunchy-pv295 1Gi 22s
Once the standby cluster is configured, you can start it up so it can start live streaming in the data:
pgo update cluster hippo --startup
And that's it, you have healed your PostgreSQL standby cluster!
As I hope I demonstrated above, the Postgres Operator does make it easier to orchestrate advanced high-availability & disaster recovery set ups with a few simple commands, but don't be lulled into a false sense of community: these kinds of architectures are still complicated. While having a multi-cluster active-standby set up should help overall with the resiliency and availability of your data, you still need to understand the complications of this deployment model.
As Kubernetes continues to advance, you can be sure that the PostgreSQL Operator will take advantage of its primitives to continue to make it easier to have multi-Kubernetes cluster setups and, in the long term, make the deployment model slightly less complicated.
Jonathan S. Katz
May 7, 2020 •More by this author