Stateful Postgres Storage Using Kubernetes
Kubernetes was developed originally as an orchestration system for stateless applications. Today, Kubernetes is the backbone of countless full stack applications with, notably, a database as part of the stack. So, a question we often hear is:
How can Kubernetes be the foundation of that most stateful application of all, the database?
Kubernetes & Storage
Let’s say you maintain a Postgres database and you’ve been tasked with moving it to Kubernetes. You can just start up a Kubernetes pod running a Postgres image and load the data and call it a day.
As soon as that pod goes away, so will that small but critical database, because the database storage existed as part of that ephemeral pod.
When you created that pod, you told the underlying computer to reserve a certain amount of resources for it—a certain amount of compute power, a certain amount of memory, and a certain amount of storage comes with this automatically. But as soon as that pod goes away, all of that is released back into the pool.
And pods do go away even when you don’t want them to. Maybe a pod exceeded the resource limits you gave it, or maybe the process hit a fatal exception, or—well, there are a lot of ways a pod can die. Let’s call this the first lesson of Kubernetes: pods are ephemeral—as the old saying goes, consider these cattle, not pets. While this model of deployment is ideal for application services, we need something different to handle the database in Kubernetes.
So, how can we create a database on Kubernetes and not worry about the ephemeral pod? The answer is “Volumes”, or more specifically certain types of volumes that are independent of the pod lifecycle.
For an example of a volume type that is not independent, let’s take
emptyDir. The Kubernetes doc on
lets us know that when a pod with an emptyDir is deleted, the data in there will
be “deleted permanently”. For backing a Postgres instance, this is not a good
What we want here is a volume that won’t go away when the pods are removed.
Luckily, Kubernetes has the concept of a Persistent Volume (
PV)- a volume that will persist no matter what happens with the pod (sort
of — we’ll get into that later).
What’s even better, a Persistent Volume is an abstraction for any kind of storage. So, do you want to put your storage somewhere remote, like on AWS, Azure, Google, etc.? Then you can use a Persistent Volume. But maybe you want to use some of your Kubernetes node’s own storage? Then you can also use a Persistent Volume.
As far as Kubernetes knows, a Persistent Volume is just some piece of storage somewhere and its lifecycle is independent of the pod that’s using it. So if the pod goes away — say, it hits a memory limit and is OOMkilled — the storage is still there, ready to be used by your Postgres pod when it regenerates.
But first, we have to tell the Postgres pod to use that storage and we have to
tell other pods to leave our storage alone. And we do that with a Persistent
Volume Claim (
Persistent Volume Claim
A Persistent Volume Claim is a claim on a persistent volume — and not just any persistent volume. After all, it wouldn’t be great for your 20G database if your Postgres pod tried to claim and use 1G of storage.
A Persistent Volume Claim lets you request a certain amount of storage. But a
Persistent Volume Claim is more than just a specification of a certain amount of
storage. You might also want to specify the
access mode for this storage or
StorageClass. For example, if you have data that changes quickly, you
might want a different storage option than if you are taking long-lasting
backups that never change.
By now you may have noticed that the Kubernetes developers are pretty good at using descriptive names and that’s true here: a storage class is a class of storage. That is, it is a category that defines certain behaviors and attributes of the storage. Or as the Kubernetes docs on Storage Class say, "Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators."
One of those behaviors is uniquely important and that’s the provisioner. That’s the field on the Storage Class that lets Kubernetes know how to dynamically provision new Persistent Volumes.
If you have a Kubernetes cluster that somehow doesn’t have any Storage Classes — or is missing a provisioner — then you won’t be able to dynamically create Volumes. In that case, the Kubernetes admin will have to manually provision those Volumes.
The main reason why I’m mentioning Storage Classes to you — the Postgres expert who needs to migrate your database to Kubernetes — is to remind you that where you put your data matters. As I said above, Kubernetes doesn’t care if your Persistent Volume is on-prem or in any particular cloud provider — but each of these options has different storage backends and those different storage backends will offer different options.
Mid-Article Kubernetes Summary
I’ve thrown a lot at you, so to summarize:
- Pods die, ephemeral pods can’t do it alone
- If you want your data to survive a pod death, you need a Persistent Volume
- In order for a Pod to use a Persistent Volume, you need to wire the Pod to the Persistent Volume through a Persistent Volume Claim
- Different Storage Classes offer different options
Putting it all together:
- Just like a Kubernetes cluster has compute and memory resources available that you can request for your pod, a Kubernetes cluster may have storage that you can request. The storage exists as a Persistent Volume and you request a certain amount of storage with a Persistent Volume Claim.
Postgres on Kubernetes
With that in mind, how would you architect a Postgres instance with persistent storage on Kubernetes? Let’s get a napkin:
That’s OK from a Kubernetes perspective: we have a pod that runs the database and we have a Persistent Volume that holds the storage. If the pod goes away, the data is still there in the volume.
But from a Postgres perspective, we have Postgres saving our data and our WAL files (which are good for backing up) to the same volume. That’s not great for some recovery scenarios. So let’s add a little more redundancy to our system by pushing our WAL files to another Persistent Volume.
That’s better for storage persistence and recovery from backup. But we probably want to add a Postgres replica for high-availability. What does that look like with persistent storage?
This is a little generic: I’m not getting into what’s pushing the WAL files to the Persistent Volume for backup storage. Theoretically, you might backup in some other ways. But the general lesson here is you probably want to have your primary storage separate from your backup storage. Maybe you want to have it really separate? You could use something like pgBackRest, which can push files to some remote cloud-based storage.
Again, the general idea here is you likely want to have two separate storage volumes for your database and your recovery material. There are a few ways to do that. I mean, if you wanted to, you could exec into your Postgres pod regularly and run pg_dump, and copy that output somewhere. That's not a production ready solution though.
The Postgres Operator & Storage
One of the great things about using an operator is that a lot of the storage handling is solved for you. With the Postgres Operator (PGO), when you spin up a Postgres instance, PGO can create the pod and the PVC according to your specifications and according to the needs of your Kubernetes cluster.
For instance, maybe you already have a Persistent Volume from a previous Postgres backup and you want to use that data to bootstrap a new cluster — we can do that. Or maybe you want to dynamically create a new Persistent Volume using a particular Storage Class or just want to use the default Storage Class — well, we can do that too with PGO.
(As a reminder, as I noted above, different commercial Kubernetes services offer
different options for Storage Classes; and in general, up-to-date clusters on
AWS EKS, Azure AKS, Google GKE, etc., will have a default Storage Class. But you
can always — and probably should — check what the Storage Classes are with
kubectl get storageclass.)
PGO creates Pods and PVCs for you
Here’s example yaml for a very basic Postgres instance, with one Postgres pod (no replicas):
apiVersion: postgres-operator.crunchydata.com/v1beta1 kind: PostgresCluster metadata: name: hippo namespace: postgres-operator spec: backups: pgbackrest: repos: - name: repo1 volume: volumeClaimSpec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi instances: - dataVolumeClaimSpec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi name: '' replicas: 1 postgresVersion: 14
Notice that the
postgrescluster object has a
Volume Claim Spec under
Spec.Backups and a
Data Volume Claim Spec under
Spec.Instances. We have
those separate and independent of each other so you could define each
Once I create that Postgres instance, I can check on the pods:
➜ kubectl get pods --namespace postgres-operator NAME READY STATUS RESTARTS AGE hippo-repo-host-0 2/2 Running 0 3m23s hippo-00-6wh4-0 4/4 Running 0 3m23s
Wait, why do I have two pods if I only have one Postgres instance with no
hippo-repo-host-0 is running
pgBackRest, our preferred backup
solution, which is connected to its own local
PersistentVolume. We can check
PersistentVolumeClaims to see that in action:
➜ kubectl get persistentvolumeclaims --namespace postgres-operator NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hippo-00-l5gw-pgdata Bound pvc-9e19c77f-c111-4891-a1b5-776d23e06c18 1Gi RWO local-path 85s hippo-repo1 Bound pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07 1Gi RWO local-path 86s
Notice also that those
pvcs have a status of
Bound and tell us which volume
they are bound to. They have a set capacity of 1Gi (as I requested in that
original yaml above) and they have a specified access mode
read-write once, meaning one pod can use this volume at a time.
StorageClass ”local-path”? That’s the default
StorageClass on this
Kubernetes cluster that I’m using:
➜ ~ kubectl describe storageclass local-path Name: local-path IsDefaultClass: Yes Provisioner: rancher.io/local-path
Because I have a default Storage Class with a provisioner, I don’t have to worry about creating a Persistent Volume by hand — the provisioner takes care of creating those based on the Persistent Volume Claims.
But what if you didn’t want to backup to another PV, but wanted to backup to some other location? PGO is built to support many different options and, out of the box, you can push your backups to:
- Any Kubernetes supported storage class (which is what we’re using here)
- Amazon S3 (or S3 equivalents like MinIO)
- Google Cloud Storage (GCS)
- Azure Blob Storage
You can even push backups to multiple repositories at the same time — so you could take a local backup and push to remote storage of your choice.
Now let’s check out the Persistent Volumes:
➜ kubectl get persistentvolumes NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07 1Gi RWO Delete Bound default/hippo-repo1 local-path 12m pvc-9e19c77f-c111-4891-a1b5-776d23e06c18 1Gi RWO Delete Bound default/hippo-00-l5gw-pgdata local-path 12m
What’s interesting here? Notice that the
Access Mode matches
PersistentVolumeClaim's. It’s also very nice that
point to the
PVC that has claimed it, just like
PVC's point to the
PersistentVolumes that they have claimed.
But what’s really interesting here is the
Reclaim Policy. Remember when I said
that the lifecycle of the
PersistentVolume was independent of the
added “sort of”? This is that “sort of.”
A Persistent Volume is independent of the Pod's lifecycle, but not independent of the Persistent Volume Claim's lifecycle. When the PVC is deleted, Kubernetes will handle the PV according to the Reclaim Policy.
So what do you do if you want to delete your
postgrescluster but want to keep
storage around to use for something later? You can accomplish this by changing
the Reclaim Policy of those Persistent Volumes to
Retain. If you do that and
then delete your postgres cluster, your persistent volumes will, well, persist.
Kubernetes was created first with stateless applications in mind, but the project has grown to embrace databases, with Kubernetes-native architecture that perfectly fits the needs of persisting data.
This is just an introduction to the ideas behind persistent storage on Kubernetes and the many options available to you running a Postgres instance on Kubernetes.
If all this is something you don’t want to handle yourself, that doesn’t mean you can’t run Postgres in Kubernetes. Our Postgres Operator has been supporting customers with stateful apps for over five years. Try our Operator today with our quickstart.
January 25, 2023 •More by this author