A change to ResultRelInfo - A Near Miss with Postgres 17.1
Since its inception Crunchy Data has released new builds and packages of Postgres on the day community packages are released. Yesterday's minor version release was the first time we made the decision to press pause on a release. Why did we not release it immediately? There appeared to be a very real risk of breaking existing installations. Let's back up and walk through a near miss of Postgres release day.
Yesterday when Postgres 17.1 was released there appeared to be breaking changes in the Application Build Interface (ABI). The ABI is the contract that exists between PostgreSQL and its extensions. Initial reports showed that a number of extensions could be affected, triggering warning sirens around the community. In other words, if you were to upgrade from 17.0 to 17.1 and use these extensions, you could be left with a non-functioning Postgres database. Further investigation showed that TimescaleDB and Apache AGE were the primarily affected extensions and if you are using them you should hold off at this time upgrading to the latest minor release or ensure to rebuild the extension against the latest PostgreSQL release in coordination with your upgrade.
The initial list of extensions for those curious:
Affected | Unaffected |
---|---|
Apache AGE | HypoPG |
TimescaleDB | pg-query |
Citus | |
pglast | |
pglogical | |
pgpool2 | |
ogr-fdw | |
pg-squeeze | |
mysql-fdw |
First, a little bit on Postgres releases. Postgres releases major versions each year, and minor versions every three months roughly. The major versions are expected to be forward compatible, but do introduce bigger changes that result in catalog changes. Major version upgrades are intended to be treated with caution. Minor version releases in contrast are intended to be only security and bug fix related. They are meant to be able to drop in and continue working within the same existing major version line.
About the Postgres ABI and Postgres Extension
The Postgres ABI, Application Binary Interface, refers to the binary-level interface between Postgres and compiled extensions, modules, or clients that interact with it. The ABI includes various structs that define key components of the system's internal workings. These structs represent how PostgreSQL manages and manipulates data, query execution, memory. They typically include things like:
- System catalogs
- Function signatures
- Data structure layouts
Why Does the ABI Matter?
Developers of extensions ensure their extensions are compatible with the Postgres ABI. Changes to the ABI between major versions necessitates recompiling any extensions to prevent runtime issues.
ABI compatibility is typically not maintained across major versions. For instance, an extension compiled for PostgreSQL 14 will likely need to be recompiled for PostgreSQL 15 because ABI changes can occur.
PostgreSQL typically aims to maintain compatibility for extensions across minor versions. This means if you build an extension for PostgreSQL 15.1, it should work for 15.2. However, this is not always the case. The nuances of PostgreSQL ABI guarantees have been a sufficiently hot topic that they produced new documentation on the subject back in July.
Yesterday there was a major struct change in 17.1.
With us so far? Let’s go deeper
Within a PostgreSQL extension there is C code that includes header files from PostgreSQL itself. When the extension is compiled, functions from those headers are represented as abstract symbols in binary. The symbols are linked to the actual implementations of the functions when the extension is loaded based on the function names. That way, an extension compiled against PostgreSQL 17.0 can usually still be loaded into PostgreSQL 17.1, as long as the function names and signatures from headers do not change (i.e. the application binary interface or "ABI" is stable).
The header files also declare structs that are passed to functions (as pointers). Strictly speaking, the struct definitions are also part of the ABI, but there is more subtlety around that. After compilation, structs are mostly defined by their size and offsets of fields, so for instance a name change does not affect ABI (though does affect API). A size change does affect ABI, a little.
typedef struct ResultRelInfo
{
NodeTag type;
/*... (130 other lines) ...*/
/* updates do LockTuple() before oldtup read; see README.tuplock */
bool ri_needLockTagTuple;
} ResultRelInfo;
Most of the time, PostgreSQL allocates structs on the heap using a macro that looks at the compile-time size of the struct ("makeNode") and initializes the bytes to 0. The discrepancy that arose in 17.1 is that a new boolean was added to the ResultRelInfo struct, which increased its size from 376 bytes to 384.
What happens next depends on who calls makeNode. If it's PostgreSQL 17.1 code, then it uses the new size. If it's an extension compiled against 17.0, then it uses the old size. When it calls a PostgreSQL function with a pointer to a block allocated using the old size, the PostgreSQL function still assumes the new size and may write past the allocated block.
That is in general quite problematic. It could lead to bytes being written into an unrelated section of memory, or the program crashing. When running tests, PostgreSQL has internal checks (asserts) to detect that situation and throw warnings.
So, in general this particular change in the struct does not actually affect the allocation size. There may be uninitialized bytes, but that is usually resolved by calling InitResultRelInfo. The issue primarily causes warnings in tests / assert-enabled builds for extensions that allocate ResultRelInfo, though only when running those tests using the new PostgreSQL version with an extension binary that was compiled against the old PostgreSQL versions.
Did we lose you yet, and so what’s the result?
Unfortunately, that's not the end of the story. Extensions that rely heavily on ResultRelInfo (like TimescaleDB) and can do some things that suffer from the size change. For instance, in one of TimescaleDB's code paths, it needs to find the index of a ResultRelInfo pointer in an array, and to do so it does pointer math. This array was allocated by PostgreSQL (384 bytes), but the Timescale binary assumes 376 bytes and the result is a nonsense number which then hits an assert failure or segmentation fault.
To be clear, the code here is not really at fault. The contract with PostgreSQL was simply not quite as assumed. For developers of Postgres extensions that's an interesting lesson for all of us.
It's quite possible that there are other issues like this in other extensions. TimescaleDB is quite popular and thus subject to broader testing that identified the issue. That said, as investigation occurred over the past 24 hours most that built against this header thus far do seem to be safe. Another advanced extension is Citus, but from our investigation the Citus extension does seem safe.
What should you do?
If you’re a Crunchy Data customer you do not need to worry. If you’re using Crunchy Data Postgres on any platform, Crunchy Bridge, Crunchy Postgres for Kubernetes - our build, release and certification procedures worked as anticipated and appropriate mitigations were applied to any of our software releases. We are fortunate to have a fantastic build and release team that is largely behind the scenes but ensures issues like this are handled. If you’re a community Postgres user, or have packaged your own extensions, it is worth reading the psql-hackers thread in order to understand which extensions have been determined to potentially be impacted and to understand the potential mitigations for the below affected versions:
- 17.0 -> 17.1
- 16.4 (and earlier) -> 16.5
- 15.8 (and earlier) -> 15.9
- 14.13 (and earlier) -> 14.14
- 13.16 (and earlier) -> 13.17
- 12.20 (and earlier) -> 12.21
In short:
If you are using TimescaleDB extension, Timescale is recommending that users do not perform the minor version installs at this time.
If you are using extensions that are indicated as potentially impacted within the pgsql-hackers list thread, additional caution is warranted before upgrading (though our own Marco Slot has confirmed that Citus is not impacted)
If you are compiling Postgres extensions from source, make sure your extensions have been compiled using the latest point version 17.1
If you are developing or installing custom Postgres extensions, it is worth taking the time to understand the impact of this particular issue and the Postgres ABI ‘commitments’.
Ultimately the default guidance of performing Postgres minor version upgrades stands and the impact of this issue was not as broad as was initially feared. The Postgres community once again provided a timely minor version release to address a collection of CVEs and fixes, and the community promptly responded to a report of potential issues. The ecosystem of Postgres providers release processes worked as intended and it appears any potential impact was largely averted.
That said, software is hard, databases in particular are tricky. As Postgres extensions grow in popularity these risks will continue to emerge and it is helpful to understand these details or ensure when selecting who is supporting you on your database they understand these issues.
Related Articles
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read
- Easy Totals and Subtotals in Postgres with Rollup and Cube
5 min read