Online Upgrades in Postgres
In our previous blog post, we talked about upgrading a PostgreSQL cluster to a new major version with pg_upgrade. This can be fast and with little downtime even for large databases, but in some instances zero downtime may be essential for doing a major upgrade. This method is called an "online upgrade" and can be achieved through logical replication. While logical replication can help to achieve a zero-downtime, online upgrade, there are still some things to consider. For some hands-on experience, you can check out this learning module on logical replication.
Logical replication can be useful if you have a few test databases that you don’t want to replicate to the upgrade destination. This is possible with logical replication, where pg_upgrade is all or nothing.
If your database is large, it can take a bit of time to traverse the network. Additionally, this upgrade method is only capable of "per DB" or “per table” replication unlike pg_upgrade. We’ll be going over the former in the steps ahead.
There are quite a few preliminary steps needed to set up logical replication for an upgrade. For example, you'll need to change the wal_level in the postgresql.conf to "logical", which will require a restart of the cluster. It's also worth noting that versions of PostgreSQL that are 9.4 or older need a third party extension to install logical replication.
Throughout this blog we're going to perform an online upgrade on the "lrtest" database. I'll be doing this on the same system using port 5411 for my PostgreSQL 11.9 version, and 5412 for my PostgreSQL 12.4 version. With those considerations in mind, let's get started.
Prerequisites and Configuration Changes
Create Primary Keys (PKs) if you don't already have PKs in use for each table that you plan to replicate. This query shows you which tables DON’T have primary keys. This should show 0 tables before you start the upgrade.
psql -U postgres -d lrtest -p 5411 psql (11.9) Type "help" for help. SELECT n.nspname as schema, c.relname as table FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind = 'r' AND NOT EXISTS ( SELECT 1 FROM pg_constraint con WHERE con.conrelid = c.oid AND con.contype = 'p' ) AND n.nspname <> ALL ( ARRAY [ 'pg_catalog', 'sys', 'dbo', 'information_schema' ] ); schema | table --------+------- (0 rows)
If you don't get a value of 0 rows here, stop and solve for that by creating PKs on your tables. Another important thing to note is that if you’ve tuned your postgresql.conf for memory, max connections, etc., those parameters are not replicated in the logical replication process. As always, it’s incredibly important to test your application on the newest PostgreSQL version and ensure the destination postgresql.conf is tuned appropriately.
We need to allow for replication connections. Let’s edit the pg_hba.conf on both nodes (substitute 0.0.0.0/0 to match the appropriate IP/CIDR). Since we’re changing the pg_hba.conf on both nodes, we’ll need to restart both nodes later on. Add the following lines, but keep in mind that the pg_hba.conf is read from top to bottom. In my use case, I’m using “trust”. This means you'll need to place these lines appropriately per the documentation:
local lrtest logicalrep trust host lrtest logicalrep 127.0.0.1/32 md5
Since logical replication requires a replication role/user, we’ll create one now. Here, we’ll technically create a “role” with login permissions which is effectively a user. We'll do this on the source cluster.
psql -U postgres -d lrtest -p 5411 psql (11.9) Type "help" for help. CREATE ROLE logicalrep LOGIN REPLICATION ENCRYPTED PASSWORD ‘yourpassword’; CREATE ROLE lrtest=#
Do note that depending on your log settings, the password listed here will show
in the PostgreSQL logs. I'll also set
logical before I connect
lrtest user while I’m still connected.
ALTER SYSTEM SET wal_level TO logical; ALTER
We’ve updated all of the parameters that will require either a reload or a
restart at this point. Let’s restart the database (source) so the config changes
are captured. If systemd is in use you can run
sudo systemctl restart postgresql-nn, where
nn is the version number. In
this case I restarted via
pg_ctl -D /path/to/data/dir restart -mf since I'm
not using systemd.
Next, we want to ensure that all of the necessary global objects like; users, their passwords, tablespaces, etc. are carried over to the destination. In order to do so we pg_dumpall to a .sql file. We’ll use this file to restore these objects after the pg_dumpall is completed.
pg_dumpall -p 5411 -U postgres -g -f /tmp/globals.sql
Since in this instance both clusters are on the same node, I don’t have to move the file. If you’re doing this across the WAN, you’ll need to move the globals.sql file to the destination accordingly. Once the file is on the destination node, we can run the following command to add the global objects.
psql -U postgres -d postgres -p 5412 -f /tmp/globals.sql SET SET SET CREATE ROLE ALTER ROLE psql:/tmp/globals.sql:16: ERROR: role "postgres" already exists ALTER ROLE
You’ll notice an “ERROR” regarding the Postgres user already existing. You can ignore this. After the dump you can see that the global logicalrep user and password are now present on the destination.
psql -p 5412 -U postgres -d lrtest psql (12.4) Type "help" for help. table pg_shadow; usename | usesysid | usecreatedb | usesuper | userepl | usebypassrls | passwd | valuntil | useconfig ------------+----------+-------------+----------+---------+--------------+-------------------------------------+----------+----------- logicalrep | 16399 | f | f | t | f | md59bccb4e5d8d0bb18b691d519254b3a68 | | postgres | 10 | t | t | t | t | | | (2 rows)
For the source database you want to replicate, run a schema only dump in addition to the above global dump. This will allow us to restore the schema information on the destination.
pg_dump -Fc -s -d lrtest -p 5411 -U postgres -f /tmp/lrtest_schema.dmp
We need to do this step because schema information is not replicated over in logical replication. Keep this in mind should you change schema information on your source with logical replication in place. Next, copy the schema dump to the destination location and restore.
pg_restore -C -d template1 -p 5412 -U postgres /tmp/lrtest_schema.dmp
Now that the global objects and schemas are all in place on the destination, we can create the publication by running the following on the source (11.9) cluster. As mentioned above, I’m only upgrading one table, but it’s the only table in the database. If you only wanted to replicate a single table, but had multiple tables in the database, you would pass FOR TABLE, per the documentation.
psql -p 5411 -U postgres -d lrtest psql (11.9) Type "help" for help. create publication lrtest_pg12_upgrade for all tables;
I’m doing the upgrade on the same system, if you’re upgrading across the WAN,
you’ll need to amend
host=, password= in the single quotes with the rest of
the connection string.
psql -p 5412 -U postgres -d lrtest psql (12.4) Type "help" for help. create subscription lrtest_pg12_upgrade connection 'port=5411 dbname=lrtest user=logicalrep' publication lrtest_pg12_upgrade; NOTICE: created replication slot "lrtest_pg12_upgrade" on publisher CREATE SUBSCRIPTION \q
At this point everything looks good in the logs, but we should verify.
To verify the data is present on the destination:
psql -p 5412 -U postgres -d lrtest psql (12.4) Type "help" for help. select count(*) from chanco; count ------- 100 (1 row) \q
Verify that number with the number on the source:
psql -p 5411 -U postgres -d lrtest psql (11.9) Type "help" for help. select count(*) from chanco; count ------- 100 (1 row) \q
After verifying the row count matches the source, let’s add a row to check and ensure it carries over from the source to the destination.
psql -p 5411 -U postgres -d lrtest psql (11.9) Type "help" for help. insert into chanco values(default,'crunchydata','8437376045','crunchy data software', 'email@example.com'); INSERT 0 1 select * from chanco where name = 'crunchydata'; id | name | phone | company | email -----+-------------+------------+-----------------------+---------------------- 101 | crunchydata | 8437376045 | crunchy data software | firstname.lastname@example.org (1 row) \q
Check for row on destination:
psql -p 5412 -U postgres -d lrtest psql (12.4) Type "help" for help. select * from chanco where name = 'crunchydata'; id | name | phone | company | email -----+-------------+------------+-----------------------+---------------------- 101 | crunchydata | 8437376045 | crunchy data software | email@example.com (1 row) select count(*) from chanco; count ------- 101 (1 row) \q
The original 100 rows from 11.9, have successfully replicated over to the 12.4 cluster. We’ve even gone a step further by ensuring that post upgrade replication works if a new row is added. At this point with logical replication running, you can leave it in place until you are ready to do the cut-over. If all of your data has been replicated, and you've already done the cut-over, you can drop the subscription.
psql -p 5412 -U postgres -d lrtest psql (12.4) Type "help" for help. DROP SUBSCRIPTION lrtest_pg12_upgrade; NOTICE: dropped replication slot "lrtest_pg12_upgrade" on publisher DROP SUBSCRIPTION #
Now that the online upgrade procedure has successfully completed, we can remove the old cluster if you no longer need it around.
I hope this information has proven useful. Be sure to check the Crunchy Data blog often as content is constantly being added, and feel free to stop by and follow us on twitter (@crunchydata) or sign up for our newsletter for additional up to date information on PostgreSQL!
James Chanco Jr.
October 20, 2020 •More by this author