Deep PostgreSQL Thoughts: Resistance to Containers is Futile
Recently I ran across grand sweeping statements that suggest containers are not ready for prime time as a vehicle for deploying your databases. The definition of "futile" is something like "serving no useful purpose; completely ineffective". See why I say this below, but in short, you probably are already, for all intents and purposes, running your database in a "container". Therefore, your resistance is futile.
And I'm here to tell you that, at least in so far as PostgreSQL is concerned, those sweeping statements are patently false. At Crunchy Data we have many customers that are very successfully running large numbers of PostgreSQL clusters in containers. Examples of our success in this area can be found with IBM and SAS.
However, just as you better have a special license and skills if you want to drive an 18 wheeler down the highway at 70 MPH, you must ensure that you have the skills and knowledge (either yourself or on your team) to properly operate your infrastructure, whether it be on-prem or in the cloud. This has always been true, but the requisite knowledge and skills have changed a bit.
What is a Container?
Let's start by reviewing exactly what a container is, and what it is not. According to someone who ought to know, Jérôme Petazzoni (formerly of Docker fame), containers are made of "namespaces, cgroups, and a little bit of copy-on-write storage". Here is a slightly dated (in particular, it is cgroup v1 specific) but still very good video in which Jérôme explains the details. Among other quotes from that talk, there is this gem:
There is this high level approach where we say, well a container is a little bit like a lightweight virtual machine, and then we also say, well but a container is not a lightweight virtual machine, stop thinking that because that puts you in the wrong mindset...
That statement is important because it implies that the degree of "virtualization" of containers is actually less than that of VMs, which of course are completely virtualized environments.
The processes in a container are running directly under the auspices of the host kernel in particular cgroups, and with their own namespaces. The cgroups provide accounting and control of the use of host resources, and namespaces provide a perceived degree of isolation, but the abstraction is much more transparent than that of a virtual machine.
In fact, to tie back to my "resistance is futile" statement above, on modern Linux systems everything is running under cgroups and namespaces, even if not running in what you think of as a "container".
For example, on a recently provisioned RHEL 8 machine running PostgreSQL I see the following:
sudo -i # ls -la /sys/fs/cgroup/*/system.slice/postgresql-12.service/tasks -rw-r--r--. 1 root root 0 Jan 29 23:58 /sys/fs/cgroup/blkio/system.slice/postgresql-12.service/tasks -rw-r--r--. 1 root root 0 Feb 1 17:41 /sys/fs/cgroup/devices/system.slice/postgresql-12.service/tasks -rw-r--r--. 1 root root 0 Feb 1 13:52 /sys/fs/cgroup/memory/system.slice/postgresql-12.service/tasks -rw-r--r--. 1 root root 0 Feb 1 17:41 /sys/fs/cgroup/pids/system.slice/postgresql-12.service/tasks -rw-r--r--. 1 root root 0 Feb 1 17:41 /sys/fs/cgroup/systemd/system.slice/postgresql-12.service/tasks # cat /sys/fs/cgroup/memory/system.slice/postgresql-12.service/tasks 6827 6829 6831 6832 6833 6834 6835 6836 # ps -fu postgres UID PID PPID C STIME TTY TIME CMD postgres 6827 1 0 Jan29 ? 00:00:02 /usr/pgsql-12/bin/postgres -D /var/lib/pgsql/12/data/ postgres 6829 6827 0 Jan29 ? 00:00:00 postgres: logger postgres 6831 6827 0 Jan29 ? 00:00:00 postgres: checkpointer postgres 6832 6827 0 Jan29 ? 00:00:02 postgres: background writer postgres 6833 6827 0 Jan29 ? 00:00:02 postgres: walwriter postgres 6834 6827 0 Jan29 ? 00:00:01 postgres: autovacuum launcher postgres 6835 6827 0 Jan29 ? 00:00:02 postgres: stats collector postgres 6836 6827 0 Jan29 ? 00:00:00 postgres: logical replication launcher
This is not "PostgreSQL running in a container", yet PostgreSQL is nonetheless running in several cgroups. Further:
# ll /proc/6827/ns/ total 0 lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 cgroup -> 'cgroup:' lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 ipc -> 'ipc:' lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 mnt -> 'mnt:' lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 net -> 'net:' lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 pid -> 'pid:' lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 pid_for_children -> 'pid:' lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 user -> 'user:' lrwxrwxrwx. 1 postgres postgres 0 Feb 1 17:45 uts -> 'uts:' # lsns NS TYPE NPROCS PID USER COMMAND 4026531835 cgroup 95 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 17 4026531836 pid 95 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 17 4026531837 user 95 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 17 4026531838 uts 95 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 17 4026531839 ipc 95 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 17 4026531840 mnt 89 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 17 4026531860 mnt 1 15 root kdevtmpfs 4026531992 net 95 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 17 4026532216 mnt 1 888 root /usr/lib/systemd/systemd-udevd 4026532217 mnt 1 891 root /sbin/auditd 4026532218 mnt 1 946 chrony /usr/sbin/chronyd 4026532219 mnt 1 1015 root /usr/sbin/NetworkManager --no-daemon 4026532287 mnt 1 1256 systemd-resolve /usr/lib/systemd/systemd-resolved
From this we can see that the PostgreSQL processes are also running in several namespaces, again despite the "fact" that PostgreSQL is "not running in a container".
So hopefully you see that any statement insinuating that you should not run PostgreSQL "in a container" flies in the face of reality.
Considerations When Using Containers
In my experience, the key issues you run into when running something like PostgreSQL in containers could be generally categorized as:
- OOM killer
- Restarts and "in motion"
- Custom Deployments
As alluded to above, none of these issues are unique to containers, although they may be exacerbated by the expectations many organization have about how the universe works once they switch to containers.
As an old database-curmudgeon(™) myself, I clearly remember the day when a small number of monolithic databases served up most of the critical data held by an organization. The hardware was expensive, and the teams of people catering to these systems were even more expensive. Careful thought, planning, testing, and processes were applied to deployment of the hardware and databases. Failing over in a crisis was an "all hands on deck" and very manual evolution. Similarly was disaster recovery from backup.
Today the expectation is to "automate all the things". A relatively inexperienced application developer should be able to go to some kind of portal and push an "easy button" and have themselves a database complete with automatic failover, healing, monitoring, and backups, with disaster recovery not too many steps away.
Containerization and container-orchestration have gone a long way to making that expectation possible, and Crunchy Data has brought together considerable expertise in PostgreSQL, containers, Kubernetes, and Kubernetes Operators in order to make it a reality. But the existence of opinionated automation does not mean that your organization can abdicate all responsibility. These are very complex distributed systems, and they deserve well trained and experienced people to watch over them. In other words, your team still needs to know what they are doing if you want all this automation to be reliable.
Without further ado, let's address these issue categories one at a time.
The OOM killer is nothing new -- it has been an issue for PostgreSQL users to worry about for at least 17 years now1. However there are some modern considerations to be aware of. Specifically when operating in a container it is common to set cgroup memory controller limits. This could also apply when running on bare metal if such limits were set, but under containers it is much more common for that to be the case. Overall this is a very complex topic and deserves its own blog post: please see my previous post, Deep PostgreSQL Thoughts: The Linux Assassin.
Storage issues are also not new and not container specific. Yes, pretty much all
environments run on network attached storage, but so do VMs and many bare metal installations. The issues with storage are typically related to being network attached, not to being "in a container".
A big missing piece in this brave new world is proper testing. Referring back to the days when databases were huge monolithic things attended by groups of people, deploying a new database on new hardware typically involved significant end-to-end testing. Like literally pulling the plug on the power while writing database records under heavy load. Or yanking the Fibre Channel connection between the server hardware and the storage array under similar conditions. These kinds of tests would find weak links in the chain between PostgreSQL and the spinning disks of rust used for persistent storage. If everything was properly configured the tests would yield a database that recovered perfectly. On the other hand, if any layer was lying, about getting the data stored persistently, the database would be corrupted reliably enough that the configuration errors would be spotted and fixed prior to going production.
Today's containerized environments have more layers that need to be tested and properly configured. But the fundamental issue is no different.
Restarts and "in motion"
Restarts and "in motion" issues are usually related to container orchestration layers, not the containers themselves. Avoiding these types of issues comes down to "knowing what you are doing" with Kubernetes or whatever you are using. And to some extent the same issues exist with VMs when they are being orchestrated. It is possible to avoid these issues if you so choose.
As mentioned above, many organizations seem to have an implicit assumption once they switch to containers, that move should come with an "easy button" that is nonetheless customizable exactly to their needs. They take a carefully crafted distributed system and overlay their own changes. Then when they have operational or upgrade troubles, they wonder why it is hard to diagnose and fix. The situation reminds me of a commonly used adage among the PostgreSQL community when someone is doing something that is generally not recommended and/or unsupported: "You break it, you get to keep both halves." With paying customers we don't usually get to take quite such a hard line, but this is a common pain point, and we continue to add flexibility to our solution in order to mitigate the pain.
The world of computing is inexorably moving toward automating everything and distributing all the bits in containers. Don't fear it, embrace it. But make sure your team is up to the task, and partner with a good bodyguard -- like Crunchy Data -- to ensure reliability and success.
February 18, 2021 •More by this author