My colleague Bob Pacheco asked me to help with a strange problem he was witnessing for a client. A new Postgres cluster was getting created on a Kubernetes node, but it refused to start. More mysteriously, there was no message in the Postgres logs, nothing in the pg_ctl start up file, and no output anywhere. The pg_ctl program started up, then ended. Nothing on stderr, nothing on stdout! We were able to duplicate it on the command line using pg_ctl. Oddly enough, a syntax error in the postgresql.conf would cause output (an error). That part happens very early, so something was happening after that point - but before the logging collector started up. Time to go to the lower level.
The executable program that actually runs Postgres itself is called, aptly
enough, postgres. You may still see it referred to as postmaster. Before
version 16 of Postgres, there was a symlink named postmaster that pointed to
Even though that symlink is no longer created, you will still see this name in
use, such as the
postmaster.pid file that appears in the data directory when
you start up Postgres.
Tools like pg_ctl and systemd call this postgres executable itself, making sure
to pass in parameters, set up stderr redirects, and so forth. But you can run it
yourself directly! This is a great debugging tool to see what is happening as
Postgres comes to life. For this session, we decided to start it in
mode, which creates a "stand-alone backend". In other words, it brings Postgres
up, but does not let anyone else in, does not start any background processes
(e.g. autovacuum), and does not listen for incoming connections. In short, a
nice and simplified Postgres to flush out problems. To use it, run the
--single as the first argument, and tell it where the data
Our first attempt looked like this and immediately revealed a bus error and a core dump!
$ bin/postgres --single -D data
Bus Error(Core Dumped)
Well, that explains the lack of output! A Bus Error occurs when a process is trying to access memory it is not allowed to. The kernel denies the attempt and sends a SIGBUS signal. The core dump indicates that the postgres executable was caught off guard by this, and crashed before it could do any sort of cleanup or output any messages. Hence, the lack of output that started this whole process.
One trick to rule out any settings or extensions is to load Postgres in a pristine state - in other words, with all configuration values at their default. To do that, you can provide your own configuration file for the postgres executable to read. Creating an empty file will do the trick:
$ touch empty
$ bin/postgres --single -D data --config-file=empty
Bus Error(Core Dumped)
Still no change, so we can rule out extensions and configuration items. The next
step was to run it with debugging cranked all the way up. The postgres
executable takes a
-d argument, with values from 1 to 5, so we ran:
$ bin/postgres --single -D data -d 5
2023-12-25 12:34:00.767 EST  DEBUG: invoking IpcMemoryCreate(size=1515356162)
Bus Error(Core Dumped)
The "Bus Error" was a strong clue that this was memory related, and we can see that the last normal thing it did was request some shared memory from the operating system. Normally, I would blame SELinux for mysterious start up problems. However, if memory is involved, there is a better suspect to consider: huge pages. In the past, Linux would allocate physical memory in blocks of 4KB. As most modern programs require a lot of memory, that meant gathering up and doing housekeeping for a LOT of those 4KB chunks. With huge pages, the kernel can use much, much larger blocks. You can see your current size with:
grep Hugepagesize /proc/meminfo
The next step was to try and load Postgres without using huge pages. The default
value for current versions of Postgres is
huge_pages=try so we wanted to force
it to "off". By passing a configuration item directly on the command line, we
can ensure that it is forced off, as command-line arguments override anything
set in the configuration files.
$ bin/postgres --single -D data --huge_pages=off
2023-12-25 12:35:00.767 EST  DEBUG: invoking IpcMemoryCreate(size=1515356162)
2023-12-25 12:35:54.961 EST  DEBUG: dynamic shared memory system will support 1674 segments
2023-12-25 12:35:54.961 EST  DEBUG: created dynamic shared memory control segment 2743578136 (269765 bytes)
2023-12-25 12:35:54.966 EST  DEBUG: InitPostgres
2023-12-25 12:35:54.966 EST  DEBUG: my backend ID is 1
Voila - our database started up! After we made the huge_pages change permanent in the postgresql.conf file, the cluster started up without any problem. But what exactly was going on with huge pages here?
It turns out this was a linux kernel bug, in which the kernel was happy to pass
you the memory to use, but then got upset when you tried to actually use it.
Obviously, it should have denied the initial request, a situation Postgres is
well-equipped to handle gracefully, without any core dumps. I found
this thread on the Linux kernel list that
would seem to be the bug in question. We also verified that the 4.18 kernel
version we were using was before the fixed version in 126.96.36.1993. So presumably
once the kernel is newer, we can switch
huge_pages back to "try"!
See more about our work on fixing this linux kernel issue with Crunchy Postgres for Kubernetes.
- When troubleshooting, start with complex then go down to simple. In this case,
we used the
- All the parts of your stack are important. This was a Linux kernel bug, but other systems could affect Postgres as well.
- Keep everything up to date. In this case, a newer version of the kernel had already fixed the problem.
- Huge pages are overall a good thing. Leave it as "try" unless you can prove it is causing problems.
Greg Sabino Mullane
February 9, 2024 •More by this author