Deep PostgreSQL Thoughts: Resistance to...

Recently I ran across grand sweeping statements that suggest containers are not ready for prime time as a vehicle for deploying your databases. The definition of "futile" is something like "serving no useful purpose; completely ineffective". See why I say this below, but in short, you probably are already, for all intents and purposes, running your database in a "container". Therefore, your resistance is futile.

And I'm here to tell you that, at least in so far as PostgreSQL is concerned, those sweeping statements are patently false. At Crunchy Data we have many customers that are very successfully running large numbers of PostgreSQL clusters in containers. Examples of our success in this area can be found with IBM and SAS.

However, just as you better have a special license and skills if you want to drive an 18 wheeler down the highway at 70 MPH, you must ensure that you have the skills and knowledge (either yourself or on your team) to properly operate your infrastructure, whether it be on-prem or in the cloud. This has always been true, but the requisite knowledge and skills have changed a bit.

What is a Container?

Let's start by reviewing exactly what a container is, and what it is not. According to someone who ought to know, Jérôme Petazzoni (formerly of Docker fame), containers are made of "namespaces, cgroups, and a little bit of copy-on-write storage". Here is a slightly dated (in particular, it is cgroup v1 specific) but still very good video in which Jérôme explains the details. Among other quotes from that talk, there is this gem:

There is this high level approach where we say, well a container is a little bit like a lightweight virtual machine, and then we also say, well but a container is not a lightweight virtual machine, stop thinking that because that puts you in the wrong mindset...

That statement is important because it implies that the degree of "virtualization" of containers is actually less than that of VMs, which of course are completely virtualized environments.

The processes in a container are running directly under the auspices of the host kernel in particular cgroups, and with their own namespaces. The cgroups provide accounting and control of the use of host resources, and namespaces provide a perceived degree of isolation, but the abstraction is much more transparent than that of a virtual machine.

In fact, to tie back to my "resistance is futile" statement above, on modern Linux systems everything is running under cgroups and namespaces, even if not running in what you think of as a "container".

For example, on a recently provisioned RHEL 8 machine running PostgreSQL I see the following:

$ sudo -i
# ls -la /sys/fs/cgroup/*/system.slice/postgresql-12.service/tasks
-rw-r--r--. 1 root root 0 Jan 29 23:58 /sys/fs/cgroup/blkio/system.slice/postgresql-12.service/tasks
-rw-r--r--. 1 root root 0 Feb  1 17:41 /sys/fs/cgroup/devices/system.slice/postgresql-12.service/tasks
-rw-r--r--. 1 root root 0 Feb  1 13:52 /sys/fs/cgroup/memory/system.slice/postgresql-12.service/tasks
-rw-r--r--. 1 root root 0 Feb  1 17:41 /sys/fs/cgroup/pids/system.slice/postgresql-12.service/tasks
-rw-r--r--. 1 root root 0 Feb  1 17:41 /sys/fs/cgroup/systemd/system.slice/postgresql-12.service/tasks

# cat /sys/fs/cgroup/memory/system.slice/postgresql-12.service/tasks
6827
6829
6831
6832
6833
6834
6835
6836

# ps -fu postgres
UID          PID    PPID  C STIME TTY          TIME CMD
postgres    6827       1  0 Jan29 ?        00:00:02 /usr/pgsql-12/bin/postgres -D /var/lib/pgsql/12/data/
postgres    6829    6827  0 Jan29 ?        00:00:00 postgres: logger
postgres    6831    6827  0 Jan29 ?        00:00:00 postgres: checkpointer
postgres    6832    6827  0 Jan29 ?        00:00:02 postgres: background writer
postgres    6833    6827  0 Jan29 ?        00:00:02 postgres: walwriter
postgres    6834    6827  0 Jan29 ?        00:00:01 postgres: autovacuum launcher
postgres    6835    6827  0 Jan29 ?        00:00:02 postgres: stats collector
postgres    6836    6827  0 Jan29 ?        00:00:00 postgres: logical replication launcher

This is not "PostgreSQL running in a container", yet PostgreSQL is nonetheless running in several cgroups. Further:

# ll /proc/6827/ns/
total 0
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 net -> 'net:[4026531992]'
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 pid -> 'pid:[4026531836]'
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 user -> 'user:[4026531837]'
lrwxrwxrwx. 1 postgres postgres 0 Feb  1 17:45 uts -> 'uts:[4026531838]'

# lsns
        NS TYPE   NPROCS   PID USER            COMMAND
4026531835 cgroup     95     1 root            /usr/lib/systemd/systemd --switched-root --system --deserialize 17
4026531836 pid        95     1 root            /usr/lib/systemd/systemd --switched-root --system --deserialize 17
4026531837 user       95     1 root            /usr/lib/systemd/systemd --switched-root --system --deserialize 17
4026531838 uts        95     1 root            /usr/lib/systemd/systemd --switched-root --system --deserialize 17
4026531839 ipc        95     1 root            /usr/lib/systemd/systemd --switched-root --system --deserialize 17
4026531840 mnt        89     1 root            /usr/lib/systemd/systemd --switched-root --system --deserialize 17
4026531860 mnt         1    15 root            kdevtmpfs
4026531992 net        95     1 root            /usr/lib/systemd/systemd --switched-root --system --deserialize 17
4026532216 mnt         1   888 root            /usr/lib/systemd/systemd-udevd
4026532217 mnt         1   891 root            /sbin/auditd
4026532218 mnt         1   946 chrony          /usr/sbin/chronyd
4026532219 mnt         1  1015 root            /usr/sbin/NetworkManager --no-daemon
4026532287 mnt         1  1256 systemd-resolve /usr/lib/systemd/systemd-resolved

From this we can see that the PostgreSQL processes are also running in several namespaces, again despite the "fact" that PostgreSQL is "not running in a container".

So hopefully you see that any statement insinuating that you should not run PostgreSQL "in a container" flies in the face of reality.

Considerations When Using Containers

In my experience, the key issues you run into when running something like PostgreSQL in containers could be generally categorized as:

OOM killer
Storage
Restarts and "in motion"
Custom Deployments

As alluded to above, none of these issues are unique to containers, although they may be exacerbated by the expectations many organization have about how the universe works once they switch to containers.

As an old database-curmudgeon(™) myself, I clearly remember the day when a small number of monolithic databases served up most of the critical data held by an organization. The hardware was expensive, and the teams of people catering to these systems were even more expensive. Careful thought, planning, testing, and processes were applied to deployment of the hardware and databases. Failing over in a crisis was an "all hands on deck" and very manual evolution. Similarly was disaster recovery from backup.

Today the expectation is to "automate all the things". A relatively inexperienced application developer should be able to go to some kind of portal and push an "easy button" and have themselves a database complete with automatic failover, healing, monitoring, and backups, with disaster recovery not too many steps away.

Containerization and container-orchestration have gone a long way to making that expectation possible, and Crunchy Data has brought together considerable expertise in PostgreSQL, containers, Kubernetes, and Kubernetes Operators in order to make it a reality. But the existence of opinionated automation does not mean that your organization can abdicate all responsibility. These are very complex distributed systems, and they deserve well trained and experienced people to watch over them. In other words, your team still needs to know what they are doing if you want all this automation to be reliable.

Without further ado, let's address these issue categories one at a time.

OOM killer

The OOM killer is nothing new -- it has been an issue for PostgreSQL users to worry about for at least 17 years now1. However there are some modern considerations to be aware of. Specifically when operating in a container it is common to set cgroup memory controller limits. This could also apply when running on bare metal if such limits were set, but under containers it is much more common for that to be the case. Overall this is a very complex topic and deserves its own blog post: please see my previous post, Deep PostgreSQL Thoughts: The Linux Assassin.

Storage

Storage issues are also not new and not container specific. Yes, pretty much all containerized
environments run on network attached storage, but so do VMs and many bare metal installations. The issues with storage are typically related to being network attached, not to being "in a container".

A big missing piece in this brave new world is proper testing. Referring back to the days when databases were huge monolithic things attended by groups of people, deploying a new database on new hardware typically involved significant end-to-end testing. Like literally pulling the plug on the power while writing database records under heavy load. Or yanking the Fibre Channel connection between the server hardware and the storage array under similar conditions. These kinds of tests would find weak links in the chain between PostgreSQL and the spinning disks of rust used for persistent storage. If everything was properly configured the tests would yield a database that recovered perfectly. On the other hand, if any layer was lying, about getting the data stored persistently, the database would be corrupted reliably enough that the configuration errors would be spotted and fixed prior to going production.

Today's containerized environments have more layers that need to be tested and properly configured. But the fundamental issue is no different.

Restarts and "in motion"

Restarts and "in motion" issues are usually related to container orchestration layers, not the containers themselves. Avoiding these types of issues comes down to "knowing what you are doing" with Kubernetes or whatever you are using. And to some extent the same issues exist with VMs when they are being orchestrated. It is possible to avoid these issues if you so choose.

Custom Deployments

As mentioned above, many organizations seem to have an implicit assumption once they switch to containers, that move should come with an "easy button" that is nonetheless customizable exactly to their needs. They take a carefully crafted distributed system and overlay their own changes. Then when they have operational or upgrade troubles, they wonder why it is hard to diagnose and fix. The situation reminds me of a commonly used adage among the PostgreSQL community when someone is doing something that is generally not recommended and/or unsupported: "You break it, you get to keep both halves." With paying customers we don't usually get to take quite such a hard line, but this is a common pain point, and we continue to add flexibility to our solution in order to mitigate the pain.

Summary

The world of computing is inexorably moving toward automating everything and distributing all the bits in containers. Don't fear it, embrace it. But make sure your team is up to the task, and partner with a good bodyguard -- like Crunchy Data -- to ensure reliability and success.

Latest Articles

Deep PostgreSQL Thoughts: Resistance to Containers is Futile