Effective PostgreSQL Cluster Configuration & Management Using PGO v5.1

Andrew L'Ecuyer

5 min read

Modern day production ready Postgres solutions require quite a bit of sophistication and automation. Changes need to be applied in a uniform and safe way. DevOps and SRE teams need to be in control system updates while limiting disruption to their users.

With the release of PGO v5.1, we are excited to announce enhancements in each of these areas. Not only does PGO v5.1 now automatically rollout all PostgreSQL configuration changes, but it allows you to protect your running databases against outages due to voluntary disruptions. Additionally, manual switchover and failover support means you can fully control your current cluster topology via your PostgresCluster spec.

PGO v5.1 delivers on what it means to be a production-ready PostgreSQL solution by providing the key features needed to configure, manage and protect your production PostgreSQL databases. Let’s take a look at what each of these new features looks like in action.

Rolling Database Restarts

Prior to PGO v5.1, if you changed a PostgreSQL configuration setting that required database restart, then it was necessary to manually trigger a rolling restart to apply that change. As of PGO v5.1, however, this step is no longer required. PGO will now detect and safely rollout configuration changes requiring a database restart.

To see this in action, update your PostgresCluster spec with a configuration change that will require a database restart. For this example we will update the shared_buffers to 256MB:

$ kubectl patch postgrescluster hippo --type=merge -p '{"spec":{"patroni":{"dynamicConfiguration":
{"postgresql":{"parameters":{"shared_buffers": "256MB"}}}}}}'
postgrescluster.postgres-operator.crunchydata.com/hippo patched

At this point, not only will PGO apply the desired configuration change, but it will also detect that a database restart is required. It will then restart all PostgreSQL instances in the cluster to automatically rollout the change.

Looking at the logs for any PostgreSQL instance in the cluster, you should see the following:

$ kubectl logs hippo-instance1-jl7v-0 -c database | grep shared_buffers
2022-04-03 18:23:31,347 INFO: Changed shared_buffers from 16384 to 256MB (restart might be required)

And by looking at the PostgreSQL logs you can see that the database was restarted:

$ kubectl exec hippo-instance1-jl7v-0 -c database -- bash -c 'cat /pgdata/pg13/log/*.log' | tail -n 5
...
2022-04-03 18:33:09.234 UTC [3188] LOG:  database system was shut down at 2022-04-03 18:33:08 UTC
2022-04-03 18:33:09.276 UTC [3181] LOG:  database system is ready to accept connections

You can also see that the setting has been applied via psql:

$ kubectl exec hippo-instance1-jl7v-0 -c database -- psql -t -c 'show shared_buffers;'
 256MB

This therefore demonstrates how PGO v5.1 makes PostgreSQL configuration easier than ever before.

Pod Disruption Budgets

Voluntary disruptions are often a reality in any Kubernetes environment. For instance, someone might drain a node containing a running PostgreSQL database. And when this occurs, it is important that all production databases be protected from an unexpected outage to the fullest extent possible.

We are therefore excited to announce support for protection against voluntary cluster disruptions in PGO v5.1. By simply defining more than once replica for an instance set or a PgBouncer deployment, PGO will now automatically create a Pod Disruption Budget (PDB) that will ensure at least one Pod for that instance set or PgBouncer deployment remains available when a voluntary disruption occurs. For instance, you can see this in action by defining two replicas for a PgBouncer deployment:

$ kubectl patch postgrescluster hippo --type='json' \
-p='[{"op":"replace","path":"/spec/pro
xy/pgBouncer/replicas","value":2}]'
postgrescluster.postgres-operator.crunchydata.com/hippo patched
$ kubectl get pdb
NAME              MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
hippo-pgbouncer   1               N/A               1                     4s

You can also set the minAvailable field for any instance set or PgBouncer deployment. This will cause PGO to reconcile a PDB that enforces that the defined number of Pods is always available.

For example, the following will ensure at least two Pods are available for an instance set with five replicas:

spec:
  instances:
    - name: instance1
      replicas: 3
      minAvailable: 2

Once the above spec is applied, PGO will reconcile a corresponding PDB:

$ kubectl get pdb
NAME              	MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
hippo-set-instance1   2           	N/A           	1                 	110s

For more information about voluntary disruptions and PDBs, please see the following:

Manual Switchovers & Failovers

While manual switchovers and/or failover are not typically required during normal operations (since the HA system will automatically failover to a replica when needed), there are certain circumstances where this functionality can be beneficial. For instance, you might want to move the primary to an instance that is running on a specific node prior to performing maintenance. Or, you might simply want to test various failover scenarios in your environment. Whatever the use case might be, PGO now has you covered.

With PGO v5.1, it is now possible to manually switchover and failover for your PostgreSQL clusters, all via the PostgresCluster spec. For instance, let's say you want to switchover to an instance called hippo-instance1-t97d within your cluster. First, you would add the following to your PostgresCluster spec to define the instance that you want to switchover to:

spec:
  patroni:
    switchover:
      enabled: true
      targetInstance: hippo-instance1-t97d

Then, annotate PostgresCluster to trigger the actual switchover:

$ kubectl annotate -n postgres-operator postgrescluster hippo \
  postgres-operator.crunchydata.com/trigger-switchover="$(date)"
postgrescluster.postgres-operator.crunchydata.com/hippo annotated

Once PGO detects the annotation, the switchover will occur. This means instance hippo-instance1-t97d will now be promoted to primary. You can verify this by looking for the Pod in the cluster with a value of master for the postgres-operator.crunchydata.com/role label.

$ kubectl get pods \
--selector=postgres-operator.crunchydata.com/role=master,postgres-operator.crunchydata.com/cluster=hippo
NAME                 	READY   STATUS	RESTARTS   AGE hippo-instance1-t97d-0   4/4
	Running   0      	26h

As you can see, Pod hippo-instance1-t97d-0 now has the master role, which means the switchover was successful.

Next Steps

Thank you for taking the time to explore a few of the new features included in PGO v5.1! There is a lot more to get excited about in this release, so I encourage you to see the PGO documentation for more details. For information about upgrading to PGO v5.1, please see the Upgrade Documentation. If you are installing PGO for the first time, please be sure to check out the quickstart.

Avatar for Andrew L'Ecuyer

Written by

Andrew L'Ecuyer

June 1, 2022 More by this author