Stateful Postgres Storage Using Kubernetes

Kubernetes was developed originally as an orchestration system for stateless applications. Today, Kubernetes is the backbone of countless full stack applications with, notably, a database as part of the stack. So, a question we often hear is:

How can Kubernetes be the foundation of that most stateful application of all, the database?

Kubernetes & Storage

Ephemeral Pods

Let’s say you maintain a Postgres database and you’ve been tasked with moving it to Kubernetes. You can just start up a Kubernetes pod running a Postgres image and load the data and call it a day.

As soon as that pod goes away, so will that small but critical database, because the database storage existed as part of that ephemeral pod.

When you created that pod, you told the underlying computer to reserve a certain amount of resources for it—a certain amount of compute power, a certain amount of memory, and a certain amount of storage comes with this automatically. But as soon as that pod goes away, all of that is released back into the pool.

And pods do go away even when you don’t want them to. Maybe a pod exceeded the resource limits you gave it, or maybe the process hit a fatal exception, or—well, there are a lot of ways a pod can die. Let’s call this the first lesson of Kubernetes: pods are ephemeral—as the old saying goes, consider these cattle, not pets. While this model of deployment is ideal for application services, we need something different to handle the database in Kubernetes.

Persistent Volumes

So, how can we create a database on Kubernetes and not worry about the ephemeral pod? The answer is “Volumes”, or more specifically certain types of volumes that are independent of the pod lifecycle.

For an example of a volume type that is not independent, let’s take emptyDir. The Kubernetes doc on Volume types lets us know that when a pod with an emptyDir is deleted, the data in there will be “deleted permanently”. For backing a Postgres instance, this is not a good idea.

What we want here is a volume that won’t go away when the pods are removed. Luckily, Kubernetes has the concept of a Persistent Volume (PersistentVolume or PV)- a volume that will persist no matter what happens with the pod (sort of — we’ll get into that later).

What’s even better, a Persistent Volume is an abstraction for any kind of storage. So, do you want to put your storage somewhere remote, like on AWS, Azure, Google, etc.? Then you can use a Persistent Volume. But maybe you want to use some of your Kubernetes node’s own storage? Then you can also use a Persistent Volume.

As far as Kubernetes knows, a Persistent Volume is just some piece of storage somewhere and its lifecycle is independent of the pod that’s using it. So if the pod goes away — say, it hits a memory limit and is OOMkilled — the storage is still there, ready to be used by your Postgres pod when it regenerates.

But first, we have to tell the Postgres pod to use that storage and we have to tell other pods to leave our storage alone. And we do that with a Persistent Volume Claim (PersistentVolumeClaim or PVC).

Persistent Volume Claim

A Persistent Volume Claim is a claim on a persistent volume — and not just any persistent volume. After all, it wouldn’t be great for your 20G database if your Postgres pod tried to claim and use 1G of storage.

A Persistent Volume Claim lets you request a certain amount of storage. But a Persistent Volume Claim is more than just a specification of a certain amount of storage. You might also want to specify the access mode for this storage or the StorageClass. For example, if you have data that changes quickly, you might want a different storage option than if you are taking long-lasting backups that never change.

Storage Classes

By now you may have noticed that the Kubernetes developers are pretty good at using descriptive names and that’s true here: a storage class is a class of storage. That is, it is a category that defines certain behaviors and attributes of the storage. Or as the Kubernetes docs on Storage Class say, "Different classes might map to quality-of-service levels, or to backup policies, or to arbitrary policies determined by the cluster administrators."

One of those behaviors is uniquely important and that’s the provisioner. That’s the field on the Storage Class that lets Kubernetes know how to dynamically provision new Persistent Volumes.

If you have a Kubernetes cluster that somehow doesn’t have any Storage Classes — or is missing a provisioner — then you won’t be able to dynamically create Volumes. In that case, the Kubernetes admin will have to manually provision those Volumes.

The main reason why I’m mentioning Storage Classes to you — the Postgres expert who needs to migrate your database to Kubernetes — is to remind you that where you put your data matters. As I said above, Kubernetes doesn’t care if your Persistent Volume is on-prem or in any particular cloud provider — but each of these options has different storage backends and those different storage backends will offer different options.

Mid-Article Kubernetes Summary

I’ve thrown a lot at you, so to summarize:

Pods die, ephemeral pods can’t do it alone
If you want your data to survive a pod death, you need a Persistent Volume
In order for a Pod to use a Persistent Volume, you need to wire the Pod to the Persistent Volume through a Persistent Volume Claim
Different Storage Classes offer different options

Putting it all together:

Just like a Kubernetes cluster has compute and memory resources available that you can request for your pod, a Kubernetes cluster may have storage that you can request. The storage exists as a Persistent Volume and you request a certain amount of storage with a Persistent Volume Claim.

Postgres on Kubernetes

With that in mind, how would you architect a Postgres instance with persistent storage on Kubernetes? Let’s get a napkin:

That’s OK from a Kubernetes perspective: we have a pod that runs the database and we have a Persistent Volume that holds the storage. If the pod goes away, the data is still there in the volume.

But from a Postgres perspective, we have Postgres saving our data and our WAL files (which are good for backing up) to the same volume. That’s not great for some recovery scenarios. So let’s add a little more redundancy to our system by pushing our WAL files to another Persistent Volume.

That’s better for storage persistence and recovery from backup. But we probably want to add a Postgres replica for high-availability. What does that look like with persistent storage?

This is a little generic: I’m not getting into what’s pushing the WAL files to the Persistent Volume for backup storage. Theoretically, you might backup in some other ways. But the general lesson here is you probably want to have your primary storage separate from your backup storage. Maybe you want to have it really separate? You could use something like pgBackRest, which can push files to some remote cloud-based storage.

Again, the general idea here is you likely want to have two separate storage volumes for your database and your recovery material. There are a few ways to do that. I mean, if you wanted to, you could exec into your Postgres pod regularly and run pg_dump, and copy that output somewhere. That's not a production ready solution though.

The Postgres Operator & Storage

One of the great things about using an operator is that a lot of the storage handling is solved for you. With the Postgres Operator (PGO), when you spin up a Postgres instance, PGO can create the pod and the PVC according to your specifications and according to the needs of your Kubernetes cluster.

For instance, maybe you already have a Persistent Volume from a previous Postgres backup and you want to use that data to bootstrap a new cluster — we can do that. Or maybe you want to dynamically create a new Persistent Volume using a particular Storage Class or just want to use the default Storage Class — well, we can do that too with PGO.

(As a reminder, as I noted above, different commercial Kubernetes services offer different options for Storage Classes; and in general, up-to-date clusters on AWS EKS, Azure AKS, Google GKE, etc., will have a default Storage Class. But you can always — and probably should — check what the Storage Classes are with kubectl get storageclass.)

PGO creates Pods and PVCs for you

Here’s example yaml for a very basic Postgres instance, with one Postgres pod (no replicas):

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo
  namespace: postgres-operator
spec:
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 1Gi
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
      name: ''
      replicas: 1
  postgresVersion: 14

Notice that the postgrescluster object has a Volume Claim Spec under Spec.Backups and a Data Volume Claim Spec under Spec.Instances. We have those separate and independent of each other so you could define each differently.

Once I create that Postgres instance, I can check on the pods:

$ kubectl get pods --namespace postgres-operator
NAME                      READY   STATUS      RESTARTS   AGE
hippo-repo-host-0         2/2     Running     0          3m23s
hippo-00-6wh4-0           4/4     Running     0          3m23s

Wait, why do I have two pods if I only have one Postgres instance with no replicas? The hippo-repo-host-0 is running pgBackRest, our preferred backup solution, which is connected to its own local PersistentVolume. We can check the PersistentVolumeClaims to see that in action:

$ kubectl get persistentvolumeclaims --namespace postgres-operator
NAME                   STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hippo-00-l5gw-pgdata   Bound         pvc-9e19c77f-c111-4891-a1b5-776d23e06c18   1Gi        RWO            local-path     85s
hippo-repo1            Bound         pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07   1Gi        RWO            local-path     86s

Notice also that those pvcs have a status of Bound and tell us which volume they are bound to. They have a set capacity of 1Gi (as I requested in that original yaml above) and they have a specified access mode RWO or read-write once, meaning one pod can use this volume at a time.

And that StorageClass ”local-path”? That’s the default StorageClass on this Kubernetes cluster that I’m using:

$ kubectl describe storageclass local-path
Name:                  local-path
IsDefaultClass:        Yes
Provisioner:           rancher.io/local-path

Because I have a default Storage Class with a provisioner, I don’t have to worry about creating a Persistent Volume by hand — the provisioner takes care of creating those based on the Persistent Volume Claims.

But what if you didn’t want to backup to another PV, but wanted to backup to some other location? PGO is built to support many different options and, out of the box, you can push your backups to:

Any Kubernetes supported storage class (which is what we’re using here)
Amazon S3 (or S3 equivalents like MinIO)
Google Cloud Storage (GCS)
Azure Blob Storage

You can even push backups to multiple repositories at the same time — so you could take a local backup and push to remote storage of your choice.

Now let’s check out the Persistent Volumes:

$ kubectl get persistentvolumes
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                          STORAGECLASS   REASON   AGE
pvc-83a217c0-bffa-4e66-8ff6-15bbd5fadf07   1Gi        RWO            Delete           Bound    default/hippo-repo1            local-path              12m
pvc-9e19c77f-c111-4891-a1b5-776d23e06c18   1Gi        RWO            Delete           Bound    default/hippo-00-l5gw-pgdata   local-path              12m

What’s interesting here? Notice that the Capacity and Access Mode matches the PersistentVolumeClaim's. It’s also very nice that PersistentVolumes point to the PVC that has claimed it, just like PVC's point to the PersistentVolumes that they have claimed.

But what’s really interesting here is the Reclaim Policy. Remember when I said that the lifecycle of the PersistentVolume was independent of the Pod and added “sort of”? This is that “sort of.”

A Persistent Volume is independent of the Pod's lifecycle, but not independent of the Persistent Volume Claim's lifecycle. When the PVC is deleted, Kubernetes will handle the PV according to the Reclaim Policy.

So what do you do if you want to delete your postgrescluster but want to keep storage around to use for something later? You can accomplish this by changing the Reclaim Policy of those Persistent Volumes to Retain. If you do that and then delete your postgres cluster, your persistent volumes will, well, persist.

Summary

Kubernetes was created first with stateless applications in mind, but the project has grown to embrace databases, with Kubernetes-native architecture that perfectly fits the needs of persisting data.

This is just an introduction to the ideas behind persistent storage on Kubernetes and the many options available to you running a Postgres instance on Kubernetes.

If all this is something you don’t want to handle yourself, that doesn’t mean you can’t run Postgres in Kubernetes. Our Postgres Operator has been supporting customers with stateful apps for over five years. Try our Operator today with our quickstart.

Latest Articles