While supporting customers at Crunchy Data, I sometimes see users add unlogged tables. In this post, I'm going to review some of the specifics about this.
Unlogged tables in Postgresql are a way of preventing specific tables from
generating WAL (Write Ahead Log) information by adding the
unlogged keyword to
create table command:
CREATE UNLOGGED TABLE foobar (id int);
You can also change existing tables to unlogged, and switch them back again:
ALTER TABLE mytable SET UNLOGGED; -- cheap! ALTER TABLE mytable SET LOGGED; -- expensive!
While there are strong advantages to using unlogged tables, you must use them carefully as their disadvantages make the number of valid use cases small.
- Massive improvements to write performance (as seen below).
- Less vacuum impact (because vacuum changes are writes that also end up in the WAL stream).
- Less total WAL (leading to less traffic, smaller backups, etc.)
- Tables are truncated on crash recovery. No durability!.
- Unlogged tables can only be accessed on the primary, not on the replicas.
- Unlogged tables can NOT be used in logical replication, or physical backups.
Let's verify that an unlogged table prevents generating WAL. First, we will create a test table using the pgbench utility, and then update it a bit. Not only will we see how long the updates take, but we will peek inside the pg_wal directory and see how much WAL was created. For this test, it's important that autovacuum remains off and no other clients are connecting, to keep things honest.
$ # Ensure that autovacuum is (temporarily!) disabled: $ psql -c 'alter system set autovacuum=off' -c 'select pg_reload_conf()' $ # Generate a table with one million rows: $ pgbench -i -s 10 dropping old tables... creating tables... generating data (client-side)... 1000000 of 1000000 tuples (100%) done (elapsed 1.25 s, remaining 0.00 s) vacuuming... creating primary keys... done in 3.60 s (drop tables 0.04 s, create tables 0.01 s, client-side generate 1.28 s, vacuum 1.81 s, primary keys 0.48 s). $ ## Update one million rows: $ touch gregtest; time psql -c 'update pgbench_accounts set aid=aid' UPDATE 1000000 real 0m5.592s user 0m0.002s sys 0m0.000s $ ## Determine how much WAL was generated by the above command: $ DD=$(psql -Atc 'show data_directory') $ find $DD/pg_wal -newer gregtest -size 16M | wc -l | xargs echo 16* | bc 320
To update a million rows in a normal, logged Postgres table took us 5.6 seconds, and generated roughly 320MB of WAL. How much different will an unlogged table be?
$ ## pgbench can directly create its tables as unlogged: $ pgbench -i -s 10 --unlogged dropping old tables... creating tables... generating data (client-side)... 1000000 of 1000000 tuples (100%) done (elapsed 0.73 s, remaining 0.00 s) vacuuming... creating primary keys... done in 1.56 s (drop tables 0.03 s, create tables 0.03 s, client-side generate 0.80 s, vacuum 0.15 s, primary keys 0.20 s). $ touch gregtest; time psql -c 'update pgbench_accounts set aid=aid' UPDATE 1000000 real 0m3.173s user 0m0.002s sys 0m0.000s $ find $DD/pg_wal -newer gregtest -size 16M | wc -l | xargs echo 16* | bc 16
Not only did we perform the update twice as fast as before, both the table creation time and the amount of WAL generated shrunk as well! Since pgbench is a benchmarking utility, let's have it run for 10 seconds and see the difference in speed for the default test pgbench uses:
$ pgbench -i -s 10 $ pgbench --time 10 pgbench (15.1) starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 10 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 1680 number of failed transactions: 0 (0.000%) latency average = 5.949 ms initial connection time = 7.761 ms tps = 168.087557 (without initial connection time)
$ pgbench -i -s 10 --unlogged $ pgbench -T 10 pgbench (15.1) starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 10 query mode: simple number of clients: 1 number of threads: 1 maximum number of tries: 1 duration: 10 s number of transactions actually processed: 4857 number of failed transactions: 0 (0.000%) latency average = 2.059 ms initial connection time = 2.522 ms tps = 485.706168 (without initial connection time)
That's a big difference! 168 transactions per second versus 485 transactions per second for the unlogged version. So while the performance gains are real, let's look closer at the drawbacks.
The data in unlogged tables is ephemeral and lives outside of the strict data protection that Postgres usually provides. So should someone literally pull the plug on your Postgres database server, your data is 100% safe, except for unlogged tables, in which your data is 100% gone - by design! Because things are not logged, there is no way to guarantee data integrity, so the table is simply truncated. Note that a simple restart will NOT truncate the table:
$ pgbench -i -s 10 --unlogged $ psql -Atc 'select count(*) from pgbench_accounts' 1000000 $ pg_ctl -D data restart waiting for server to shut down.... done server stopped ... server started $ psql -Atc 'select count(*) from pgbench_accounts' 1000000
However, a crash will truncate it. We can force Postgres to crash by forcibly killing one of its subsystems. In this example, we will pick on the walwriter backend. Do not do this on a production system!
$ pgbench -i -s 10 --unlogged $ kill -9 $(psql -Atc "select pid from pg_stat_activity where backend_type = 'walwriter'") $ psql -Atc 'select count(*) from pgbench_accounts' psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: the database system is in recovery mode $ psql -Atc 'select count(*) from pgbench_accounts' 0
That data is lost forever. Lucky for us, it was only random sample data
generated by the pgbench program, so we can recreate it easy enough using
pgbench -i. If you are using unlogged tables, you must always be ready to
either lose the data forever, or have a way to quickly repopulate the table.
Because streaming replication (and for that matter, log shipping of WAL files) uses information from the WAL files to update replicas, unlogged tables are not able to be replicated. They do exist on the replica, but are empty and not accessible.
$ ## Spin up a replica $ pg_basebackup -R -c fast -D data_replica $ ## Set a new port: $ echo port=5555 >> data_replica/postgresql.conf $ ## Cluster names are handy: $ echo "cluster_name='myreplica'" >> data_replica/postgresql.conf $ ## We do not want this replica sending WAL files to pgbackrest $ echo "archive_mode=off" >> data_replica/postgresql.conf $ ## Start up the replica on port 5555 $ pg_ctl -D data_replica start $ ## Generate some WAL traffic on the primary (default port) $ pgbench -i --unlogged; pgbench -T 3 $ ## Confirm replication is happening $ psql -c 'select application_name, replay_lag from pg_stat_replication' application_name | replay_lag ------------------+----------------- myreplica | 00:00:00.001221
On the replica, the table is not simply empty, it is completely inaccessible and throws an error when we try to view it:
$ psql -p 5432 -Atc 'select count(*) from pgbench_accounts' 100000 $ psql -p 5555 -Atc 'select count(*) from pgbench_accounts' ERROR: cannot access temporary or unlogged relations during recovery
Here, "recovery" means that the the replica is constantly recovering, or playing through the WAL stream as it comes in. Because the WAL stream does not contain information about any changes in the table, we cannot look at its contents. Indeed, the underlying data files do not even exist on the replica. While the replica is almost a byte for byte copy of the primary, unlogged tables are absolutely part of the "almost" at the start of this sentence.
$ pgbench -i $ PRIMARYDIR=$( psql -Atc 'show data_directory' ) $ REPLICADIR=$( psql -p 5555 -Atc 'show data_directory' ) $ FILENODE=$( psql -Atc "select pg_relation_filenode('pgbench_accounts')" ) $ find $PRIMARYDIR -ls | grep $FILENODE | sort -k9 363139 13120 -rw------- 1 greg greg 13434880 Dec 25 8:03 data/base/5/94858 363184 24 -rw------- 1 greg greg 24576 Dec 25 8:03 data/base/5/94858_fsm 363183 8 -rw------- 1 greg greg 8192 Dec 25 8:03 data/base/5/94858_vm $ find $REPLICADIR -ls | grep $FILENODE | sort -k9 363174 13120 -rw------- 1 greg greg 13434880 Dec 25 8:03 data_replica/base/5/94858 363180 24 -rw------- 1 greg greg 24576 Dec 25 8:03 data_replica/base/5/94858_fsm 364135 8 -rw------- 1 greg greg 8192 Dec 25 8:03 data_replica/base/5/94858_vm
The fsm and vm tables stand for "free space map" and "visibility map" respectively, and contain metadata about the actual data, which as we can see is 13,434,880 bytes on both system. Unlogged tables look different:
$ pgbench -i --unlogged $ FILENODE=$( psql -Atc "select pg_relation_filenode('pgbench_accounts')" ) $ find $PRIMARYDIR -ls | grep $FILENODE | sort -k9 363180 13120 -rw------- 1 greg greg 13434880 Dec 25 8:03 data/base/5/94880 363174 24 -rw------- 1 greg greg 24576 Dec 25 8:03 data/base/5/94880_fsm 363181 0 -rw------- 1 greg greg 0 Dec 25 8:03 data/base/5/94880_init 363133 8 -rw------- 1 greg greg 8192 Dec 25 8:03 data/base/5/94880_vm $ find data_replica -ls | grep 94880 | sort -k9 363173 0 -rw------- 1 greg greg 0 Dec 25 8:03 data_replica/base/5/94880_init
init file is an empty version of the "real" table called an
whose job is to replace the real file if the database crashes and starts up
again, thus performing the "truncation" part of the unlogged tables lifecycle.
While we cannot access the contents of the table on the replica, all of its
metadata is still in place, so things like
\d still work fine:
$ psql -p 5555 -c '\d pgbench_accounts' Unlogged table "public.pgbench_accounts" Column | Type | Collation | Nullable | Default ----------+---------------+-----------+----------+--------- aid | integer | | not null | bid | integer | | | abalance | integer | | | filler | character(84) | | | Indexes: "pgbench_accounts_pkey" PRIMARY KEY, btree (aid)
Tools like pgBackRest exclude temporary and unlogged tables from backups. Again, this means that you must treat all your data in unlogged tables as very transient. Let's prove that backups do not copy the files:
$ psql -c 'create table aaa as select 867 as id from generate_series(5,309)' $ psql -c 'create unlogged table bbb as select 867 as id from generate_series(5,309)' $ psql -c 'vacuum aaa, bbb' $ AAA=$( psql -Atc "select pg_relation_filenode('aaa')" ) ## gives 980808 $ BBB=$( psql -Atc "select pg_relation_filenode('bbb')" ) ## gives 980811 $ pgbackrest --stanza=mydemo backup $ grep -E "$AAA|$BBB" /var/log/pgbackrest/mydemo-backup.log | cut -d\( -f1 | sort -k7 2023-12-25 13:09:37.482 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980808 2023-12-25 13:09:34.492 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980808_fsm 2023-12-25 13:09:37.479 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980808_vm 2023-12-25 13:09:40.658 P02 DETAIL: backup file /home/greg/pg/15/data_replica/base/5/980811_init
If you peer closely at the output of our pgbackrest logs above, you will notice that table 'aaa' had three files for it backed up: the main file holding the rows, a visibility map file, and a free space map file. In contrast, the only thing that got copied for our unlogged table 'bbb' was the "init" file, which upon a restore will become a new and empty version of the "bbb" table. Also observe that our backup pulled the file from the replica, thanks to the awesome backup-standby pgBackRest feature.
Be careful, because tools that do logical backups such as pg_dump will still have the data rows, although this is not necessarily a good thing :)
$ pg_dump -t bbb | wc -l 350
If you take a table that is logged and change it to unlogged, the process is relatively quick. While a new file on disk is written, no WAL is generated.
$ pgbench -i -s 20 $ time psql -c 'alter table pgbench_accounts set unlogged' ALTER TABLE real 0m1.754s user 0m0.000s sys 0m0.004s
On the other hand, changing a table from unlogged to logged means writing information about the entire file to the WAL stream, which can be expensive:
$ pgbench -i -s 10 --unlogged $ time psql -c 'alter table pgbench_accounts set logged' ALTER TABLE real 0m16.601s user 0m0.004s sys 0m0.000s
As we have seen, the use of unlogged tables in Postgres can give you a very large performance improvement, but with some very large drawbacks. If the data is temporary and/or easily replaced, you may want to give unlogged tables a try. Just treat your unlogged tables as living in a Schrödinger's box, in which at any moment the table could contain all the rows, or none of the rows! :)
Greg Sabino Mullane
February 17, 2023 •More by this author