How To Correct and Identify Indexes Affected by the GNU C 2.28 Update
Version 2.28 (release notes) of the GNU C library introduces many changes to the collations it provides. Collations determine how strings are compared and by default, PostgreSQL uses the operating system’s collations which on Linux means glibC. When your operating system updates to this version of glibc and you aren't using the “C” or “POSIX” collation, you may encounter some differently ordered indexes. This unexpected change in the order of indexes will lead to incorrectly ordered query results and possible data corruption. Currently, the following distributions are affected:
- Ubuntu 18.10 (cosmic)
- RHEL/CentOS 8
- Debian 10 (buster)
How can a system update cause index corruption?
By default, PostgreSQL uses the operating systems collations (as provided by the GNU C library). Since collations determine how strings are compared, changing them can affect the result of the ORDER BY clause in SELECT statements and the order of the keys in B-tree indexes. Changes to the behavior of ORDER BY can be annoying, but changing the order of an index can lead to incorrect query results and duplicate entries.
You may also see rows being stored in the wrong partition if using columns of type text, varchar, char, or citext in the partition key on range-partitioned tables.
What indexes are affected?
Only indexes involving columns of type text, varchar, char, or citex are affected. Databases or table columns using the “C” or “POSIX” locales and table columns using collations with the ICU provider are not affected.
You can identify affected indexes in all databases with the following query:
SELECT indrelid::regclass::text, indexrelid::regclass::text, collname, pg_get_indexdef(indexrelid)
FROM (SELECT indexrelid, indrelid, indcollation[i] coll FROM pg_index, generate_subscripts(indcollation, 1) g(i)) s
JOIN pg_collation c ON coll=c.oid
WHERE collprovider IN ('d', 'c') AND collname NOT IN ('C', 'POSIX');
How can I correct this?
The simplest method is to rebuild the indexes by running REINDEX index_name
for all affected indexes. However, this will block all access to the index and prevent DDL on the table during the process. Starting with PostgreSQL version 12 (to be released this fall), you can use REINDEX CONCURRENTLY to avoid these limitations. For older versions of PostgreSQL you can work around these limitations by creating a copy of the index using CREATE INDEX CONCURRENTLY
, then dropping the original index as shown below:
CREATE INDEX CONCURRENTLY myindex2 ON mytable (textcol);
DROP INDEX myindex;
ALTER INDEX myindex2 RENAME TO myindex;
It should be noted that the above will not work for primary keys. Additionally sufficient storage for both indexes will be needed for the duration of the process described above.
You can find more information about the locale data changes here: https://wiki.postgresql.org/wiki/Locale_data_changes
and the Postgres documentation on managing collations here: https://www.postgresql.org/docs/10/collation.html#COLLATION-MANAGING
Related Articles
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read
- Easy Totals and Subtotals in Postgres with Rollup and Cube
5 min read
- A change to ResultRelInfo - A Near Miss with Postgres 17.1
8 min read