The desire to use Pod tolerations to schedule Postgres instances sometimes comes up around complex Kubernetes deployments. To address this feedback, we added support for tolerations to the 4.6 release of the Postgres Operator along with improvements to using node affinity.
To use tolerations with PostgreSQL deployments, it helps to understand some of the mechanics behind several Kubernetes features to get the desired result of deploying PostgreSQL to a specific node group.
Let's take a look at how we can use Pod tolerations with the PostgreSQL Operator to create different production topologies. First, let's cover how Kubernetes taints, tolerations, and node affinity can work together.
Node affinity, taints, and tolerations
One of Kubernetes' primary jobs is to schedule Pods, the fundamental units of execution, to nodes.Kubernetes will perform Pod scheduling without any guidance, though its interface can give it hints on how and where to schedule Pods.
Node affinity (which the PostgreSQL Operator has supported for quite some time) provides Kubernetes guidance for how it can schedule a Pod. In the context of the Postgres Operator, there are two types of node affinity that it supports:
required: Kubernetes must schedule a Pod to a node that matches the node affinity rule. If it cannot schedule the Pod to that node, it must not schedule it at all.
preferred: Kubernetes should try to schedule a Pod to a node that matches the node affinity rule. If it cannot, it should attempt to schedule it elsewhere.
This distinction is important. Often people wonder why a Pod is not scheduled to
a particular node even though node affinity is set. The likely culprit is that
the node affinity has a
preferred rule instead of
While node affinity rules tell Kubernetes to "try to schedule this Pod here," node taints do the opposite. A node taint tells the Kubernetes scheduler that a node is "off limits" unless it meets certain conditions. In other words, a Taint allows you to create a "lock" one or more nodes so that Kubernetes can only schedule Pods to them that have a particular "key." You can read more about tainting nodes in the Kubernetes documentation.
Pod tolerations provide the "keys" to allowing Kubernetes to schedule Pods to tainted nodes. If a Pod has a Toleration that matches the Taint of a node, then Kubernetes knows it can schedule the Pod to that node.
Note that just because a Pod has a matching toleration for a node does not mean that Kubernetes will schedule the Pod to that node. tolerations only allow give permission for the scheduling of Pods to Tainted nodes. node affinity provides Kubernetes guidance on where to actually schedule the Pods.
With these concepts, let's look at how we can use Pod tolerations to schedule Postgres clusters to tainted nodes.
Deploying PostgreSQL with Pod tolerations
The Postgres Operator supports two ways of managing tolerations for a PostgreSQL
cluster: through the
or through a GitOps workflow using a
For the example below, we will use the
pgo create cluster,
pgo update cluster,
commands support the
--toleration flag, which allows for the addition of one
or more tolerations to a PostgreSQL cluster. Values accepted by the
--toleration use the following format:
rule can represent existence (e.g.
key) or equality (
Effect is one of
For example, to add two tolerations to a new PostgreSQL cluster, one that is an
existence toleration for a key of
ssd and the other that is an equality
toleration for a key/value pair of
db/01, you can run the following command:
pgo create cluster hippo \ --toleration=ssd:NoSchedule \ --toleration=db=01:NoSchedule
Now let's say you have a group of nodes with the Taint
db/02 that you are
reserving for replicas. You can add a replica to the
hippo cluster with a
db/02 with the following command:
pgo scale hippo --toleration=db=02:NoSchedule
If you want to update tolerations on an existing cluster, you can do so by
either modifying the
custom resources directly
or with the
pgo update cluster
command. pgo update cluster can also remove a toleration if it detects a - at
the end of the toleration effect.
For example, to add a toleration of
nvme:NoSchedule and remove the toleration
ssd:NoSchedule, you could run the following command:
pgo update cluster hippo \ --toleration=nvme:NoSchedule \ --toleration=ssd:NoSchedule-
The PostgreSQL Operator will roll out any changes to the appropriate instances.
Mixing in node affinity
Now, even though you may have given your Postgres cluster the "keys" for deployment to nodes with specific taints, Kubernetes may not actually schedule them there. tolerations only give you permission to deploy. Node affinity gives Kubernetes rules on where to actually deploy Pods.
Using the previous example, let's say that we want to deploy our hippo Postgres cluster to two different node groups: one with node label db=01 and one with node label db=02. Note that while these have the same names as the taints, node labels are not the same as taints. This is to illustrate how to use node affinity to guide Kubernetes to deploy our PostgreSQL instances.
We want to force Kubernetes to deploy each Postgres instance to the specific nodes. We can use the --node-affinity-type flag to make Kubernetes to build out our deployment topology:
pgo create cluster hippo \ --toleration=ssd:NoSchedule \ --toleration=db=01:NoSchedule \ --node-affinity=db=01 \ --node-affinity-type=required pgo scale cluster hippo \ --toleration=ssd:NoSchedule \ --toleration=db=02:NoSchedule \ --node-affinity=db=02 \ --node-affinity-type=required
Kubernetes tolerations and node affinity, coupled with the Postgres Operator, are a powerful combination for creating sophisticated deployments strategies for production PostgreSQL clusters. You should make sure you understand how these tools can affect high availability when designing a production environment for your data.
Pod tolerations do allow for your PostgreSQL instances to take advantage of hardware that you want to reserve for your databases and help you to leverage the power of Kubernetes for your Postgres deployments.
Jonathan S. Katz
February 12, 2021 •More by this author