At Crunchy Data we spend a lot of time helping customers migrate their databases. Migrating Postgres tends to be a very straightforward process. Yet there can still be a few gotchas that can catch you off-guard if you are not prepared to deal with them. From some recent experiences with customers migrating to Crunchy Bridge we found most customers had not considered the underlying collations. These customers ran a risk of data corruption by not handling collation review and updates as part of their migration. A mismatched glibc is one of those details that could actually be a big gotcha and quite the headache if you are unaware of it - so we wanted to cover a few quick details.
Why should I care about mismatched
glibc versions can have a risk of:
- Missing data when you query it
- Inconsistent sorting between versions
- Undetected unique constraint violations
These can all result in data corruption issues. For example, if you have a unique constraint on email addresses and the sort returns is different across versions - you may now have two accounts for users. You may get empty results when you query. Reconciling data corruption may be simple if it is a single record, but the longer it lives the bigger the cleanup and can result in weeks of pain.
We’ve seen differences between glibc versions when:
physical replication to migrate databases
pgbackrest) from one host to a new one.
- Restoring a binary backup (i.e.,
pg_basebackup) on a system with different OS configuration.
- Upgrading the Linux distribution to a new major release while keeping the PostgreSQL data directory. In this case, the glibc version may have changed but your underlying data did not.
Not all types of migrations or replication are affected by this inconsistency. Situations where the data is transported in a logical (not binary) way are quite safe, including:
- Backups and restore processes using
pg_dump, since these use logical data only
- Logical replication, which uses only a data copy and not the physical copy
How the sorting works
For a very simple but practical example, on glibc versions older than 2.28 we can run this query and see how data sorts.
old-glibc::DATABASE=> SELECT * FROM (values ('a'), ('$a'), ('a$'), ('b'), ('$b'), ('b$'), ('A'), ('B')) AS l(x) ORDER BY x ; x ---- a $a a$ A b $b b$ B (8 rows)
Then run the same on a newer version:
new-glibc::DATABASE=> SELECT * FROM (values ('a'), ('$a'), ('a$'), ('b'), ('$b'), ('b$'), ('A'), ('B')) AS l(x) ORDER BY x ; x ---- $a $b a A a$ b B b$ (8 rows)
What is glibc?
Libc is the main C library used by the Linux system. Many Linux programs, including Postgres, use the glibc implementation. It is used to provide many fundamental software operations and is used inside Postgres to do things like sorting text or comparing data when creating indexes.
glibc 2.28 in 2018 brought localization and collation
information into compliance with the 2016 Edition 4 ISO 14651 standards. With
the update, indexes that were created with a prior version of the collations
potentially exhibit corruption when being read by a system using the updated
collations. If there is a mis-match the indexes must be rebuilt to avoid issues.
What collations are you using?
You can find the data collation your databases are using via the
SELECT datname, datcollate FROM pg_database; datname | datcollate -----------+------------- postgres | en_US.UTF-8 demo | en_US.UTF-8 template1 | en_US.UTF-8 template0 | en_US.UTF-8 (4 rows)
And to check your
glibc version (this query is environment dependent)
select collname, collversion from pg_collation where collprovider = 'c'; collname | collversion ------------------+------------- C | POSIX | ucs_basic | C.utf8 | en_AG | 2.28 en_AU | 2.28 en_AU.utf8 | 2.28
How do I fix it?
Fix during a migration
Since this issue shows up with binary data that is moved across operating
glibc versions, this generally shows up during a migration. Migrating
via logical replication or logical backup (i.e.,
pg_dump) eliminates the issue
as any affected indexes will be recreated at the time. So changing direction to
logical restore might be worth thinking about.
For large databases, in excess of 100GB, logical backup migrations can take longer than desirable. In those cases, WAL migration followed by rebuilding affected indexes is generally the method we prefer in order to minimize downtime and engineering effort.
On a live database
If you think you might have an issue collations post migration, the
extension help identify any data inconsistencies.
SELECT bt_index_check(index => c.oid, heapallindexed => true), c.relname, c.relpages FROM pg_index i JOIN pg_opclass op ON i.indclass = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog' -- Don't check temp tables, which may be from another session: AND c.relpersistence != 't' -- Function may throw an error when this is omitted: AND c.relkind = 'i' AND i.indisready AND i.indisvalid ORDER BY c.relpages; bt_index_check | relname | relpages ----------------+-----------------------------------------------+---------- | pg_publication_pubname_index | 1 | pg_largeobject_loid_pn_index | 1 | pg_largeobject_metadata_oid_index | 1
bt_index_check is empty, there’s no indexes that would change collations if
REINDEX is run. If the index check shows information, you’ll likely need to
Side note: The
amcheck extension can be a somewhat resource intensive process
to run, both in terms of I/O and time. If you have a large or critical database,
consider running this on a physical replica so you don’t disrupt production
workflows, as this can be detected on a binary copy of the database.
If you’ve found an issue with the above steps, you’ll need to
REINDEX CONCURRENTLY. (Note: If you are using Postgres 14, we recommend to use
14.4 or higher to
REINDEX CONCURRENTLY to avoid further risk of corruption).
Data migrations can often be straightforward but you want to make sure you ask the right questions instead of assuming things will just work. We hope you’ll have found this helpful as you consider a migration, but if you have additional questions please reach out as we may be able to help.
September 2, 2022 •More by this author