The Rest is History: Investigations of WAL History Files
PostgreSQL uses the concept of a timeline to identify a series of WAL records in space and time. Each timeline is identified by a number, a decimal in some places, hexadecimal in others. Each time a database is recovered using point in time recovery and sometimes during standby/replica promotion, a new timeline is generated.
A common mistake is to assume that a higher timeline number is synonymous with the most recent data. While the highest timeline points to the latest incarnation of the database, it doesn't guarantee that the database indeed holds the most useful data from an application standpoint. To discern the validity of this statement, a closer examination of the Write-Ahead Logging (WAL) history files is essential, unraveling the messages they convey.
In this discussion, we will explore a recovered database and trace the narrative embedded in the history files. By the conclusion, you will have gained a deeper insight into the functionality of these history files within Postgres, empowering you to address queries related to recovery processes and the database's historical journey (or may I call it the 'family tree').
Assessing current state
Let's begin by gaining insights into the current status of the database. The information obtained from the pg_controldata
output indicates that the database is currently on timeline 11. Take note of the latest checkpoint Write-Ahead Logging (WAL) file, identified as '0000000B0000000100000039'. See my previous post on WAL file naming and numbering.
It's worth noting that timelines are sometimes expressed in decimal form, as in the case of 11, and at other times in hexadecimal form, such as '0000000B'. While this dual representation might be perplexing initially, familiarity with when and where these different forms are employed will contribute to a clearer understanding.
$ pg_controldata
...
pg_control last modified: Tue 06 Feb 2024 03:10:53 PM EST
Latest checkpoint location: 1/39000060
Latest checkpoint's REDO location: 1/39000028
Latest checkpoint's REDO WAL file: 0000000B0000000100000039
Latest checkpoint's TimeLineID: 11
...
Another crucial aspect to consider is examining the contents of the pg_wal directory to identify the existing history files. If the current database was not the primary when timeline 11 was created, it may only have the latest history file present. In this case, the server we are investigating was and is the primary Postgres instance.
$ ls -1 $PGDATA/pg_wal/*.history
00000003.history
0000000A.history
0000000B.history
At first glance, one can assume that timeline 11 (remember the history file and segment files use hexadecimals) has a family tree that stems from timelines 10 (0000000A) and 3 (00000003). As for any assumption, we must verify this before we can confirm this assumption as fact. To do this, let's take a look at the contents of the history file for timeline 11.
$ cat 0000000B.history
1 0/710000A0 no recovery target specified
2 0/72000000 no recovery target specified
3 1/22000CE0 before 2024-02-03 00:37:49.381764-05
10 1/230000A0 no recovery target specified
Looking at the contents of the history file we see the 'family tree' of timeline 11. Timeline 11 was created from timeline 10 at LSN 1/230000A0. Timeline 10 was created from timeline 3 at LSN 1/22000CE0. Wait! What about timelines 4 - 9? What does 'no recovery target specified' and 'before 2024-02-03 00:37:49.381764-05' mean? These are the right questions to be asking. Let's continue our quest.
History file contents
Starting from the top of timeline 11's history file, we read down the list to see the family tree. The 'no recovery target specified' lets us know that the timeline was created, more than likely, from a promotion (select pg_promote()
for example). Timeline 10 on the other hand was created using a point in time recovery performed against timeline 3. Where does the timestamp come from? Was that the timestamp of the last transaction? More great questions. Let us explore those.
The first thing we need to do is examine the contents of the WAL segment that timeline 10 was created from. In this case, timeline 10 was created from timeline 3 at LSN 1/22000CE0. Translating the LSN into the exact WAL segment gives us 000000030000000100000022. The number in the LSN before the slash is the 'high number' while the first few characters after the slash is the low number. These two numbers prefixed with the timeline give us the WAL segment name (remember in hexadecimal). Below is an extract from pg_waldump
of this segment.
$ pg_waldump 000000030000000100000022
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 1/22000028, prev 1/21000138, desc: RUNNING_XACTS nextXid 1035 latestCompletedXid 1034 oldestRunningXid 1035
rmgr: Heap len (rec/tot): 61/ 1666, tx: 1035, lsn: 1/22000060, prev 1/22000028, desc: DELETE xmax: 1035, off: 4, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/16684 blk 0 FPW
rmgr: Heap2 len (rec/tot): 1460/ 1460, tx: 1035, lsn: 1/220006E8, prev 1/22000060, desc: MULTI_INSERT ntuples: 1, flags: 0x02, offsets: [5], blkref #0: rel 1663/5/16684 blk 0
rmgr: Heap2 len (rec/tot): 63/ 63, tx: 1035, lsn: 1/22000CA0, prev 1/220006E8, desc: PRUNE snapshotConflictHorizon: 1034, nredirected: 0, ndead: 4, nunused: 0, redirected: [], dead: [1, 2, 3, 12], unused: [], blkref #0: rel 1663/5/16684 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1035, lsn: 1/22000CE0, prev 1/22000CA0, desc: COMMIT 2024-02-03 00:37:49.381764 EST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 1/22000D08, prev 1/22000CE0, desc: RUNNING_XACTS nextXid 1036 latestCompletedXid 1035 oldestRunningXid 1036
rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 1/22000D40, prev 1/22000D08, desc: SWITCH
At LSN 1/22000CE0 is where timeline 10 was created. Looking at the above, we see that position has a commit with a timestamp of '2024-02-03 00:37:49.381764 EST' (the last set of characters from the LSN is the offset within the WAL segment). The WAL history file is telling us that timeline 10 was created prior to this commit ('before ...'). Whatever transaction was committed at CE0 will not be in timeline 10.
So why is this timestamp relevant? To understand this, let me provide some background. The timestamp passed to pgBackRest for the point in time recovery was '2024-02-03 00:30:10 EST'. The commit at CE0 was the first transaction after our recovery target time. Thus, the history file shows 'before 2024-02-03 00:37:49.381764 EST'.
One last word before we move on. The last piece of information the history file has for us is that timeline 11 was created from timeline 10. Based on the 'no recovery target specified' we can safely assume this was from a promotion type event or there was a recovery and no more WAL segments were known or available.
Missing timelines
What about timelines 4 - 9? Don't worry, I have not forgotten this question. To get the history files for those timelines we will need to go to the pgBackRest repository. To retrieve those, I am going to execute a statement similar to the below to restore them from pgBackRest:
$ pgbackrest archive-get 00000004.history --stanza=rhino /app/pgdata/rhino.16/pg_wal/00000004.history
The above is repeated for timelines 4 - 9. We will start our investigation from timeline 9's history file. Below is the content:
$ cat 00000009.history
1 0/710000A0 no recovery target specified
2 0/72000000 no recovery target specified
3 3/DA0000A0 no recovery target specified
4 3/DB0000A0 no recovery target specified
5 3/DC000000 no recovery target specified
6 5/2E0000A0 no recovery target specified
7 5/300000A0 no recovery target specified
8 5/570000A0 no recovery target specified
Looking at the above we see that there was a normal progression (meaning no recoveries) from timeline 1 - 9. This does not mean that timeline 4, for example, does not have any updates past LSN 3/DB0000A0. That is a different topic for a different blog post. If we could graph our timelines it would look something like this:
Our journey has taken us pretty far to this point. However, we need to answer the opening question about which timeline has the latest data. First, we need to know how long timeline 10 and 11 have existed. To do this, we will once again perform a dump using pg_waldump
of the starting WAL segment for timeline 10 (hint, will be the same segment name with a different timeline prefix). Take a look at the content of this segment:
$ pg_waldump 0000000A0000000100000022
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 1/22000028, prev 1/21000138, desc: RUNNING_XACTS nextXid 1035 latestCompletedXid 1034 oldestRunningXid 1035
rmgr: Heap len (rec/tot): 61/ 1666, tx: 1035, lsn: 1/22000060, prev 1/22000028, desc: DELETE xmax: 1035, off: 4, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/16684 blk 0 FPW
rmgr: Heap2 len (rec/tot): 1460/ 1460, tx: 1035, lsn: 1/220006E8, prev 1/22000060, desc: MULTI_INSERT ntuples: 1, flags: 0x02, offsets: [5], blkref #0: rel 1663/5/16684 blk 0
rmgr: Heap2 len (rec/tot): 63/ 63, tx: 1035, lsn: 1/22000CA0, prev 1/220006E8, desc: PRUNE snapshotConflictHorizon: 1034, nredirected: 0, ndead: 4, nunused: 0, redirected: [], dead: [1, 2, 3, 12], unused: [], blkref #0: rel 1663/5/16684 blk 0
rmgr: XLOG len (rec/tot): 42/ 42, tx: 0, lsn: 1/22000CE0, prev 1/22000CA0, desc: END_OF_RECOVERY tli 10; prev tli 3; time 2024-02-06 13:25:33.877840 EST
...
According to the above, the recovery of the database completed at 1:25 PM EST on 2/6. That means that database (timeline 10 and 11), now contains 2 hours of application changes (assuming application was resumed immediately after restore). Now, let's compare this to the last WAL segment in timeline 9:
$ pg_waldump 000000090000000500000057
...
rmgr: Transaction len (rec/tot): 34/ 34, tx: 1661, lsn: 5/570032B0, prev 5/57002B28, desc: COMMIT 2024-02-06 13:23:51.110938 EST
rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 5/570032D8, prev 5/570032B0, desc: RUNNING_XACTS nextXid 1662 latestCompletedXid 1661 oldestRunningXid 1662
...
We can determine the latest WAL segment by reviewing the Postgres logs and/or the pgBackRest repository. In this example, WAL segment 000000090000000500000057 was the latest. This segment was restored and the above pg_waldump
shows the last committed transaction was on 2/6/24 at 1:23:51 PM EST. This means that timeline 9 has 84 hours and some change of application changes. This was determined by measuring the difference of the 'before' timestamp in timeline 11's history file (the time it was forked from timeline 3) and the last transaction in timeline 9.
Back to our question. Does timeline 11 have the latest data? Maybe, it has a few hours at best of later processed data, but to accept that means losing 84 hours of data.
Conclusion
The timeline history files plus some handy investigation work can tell us a story of the databases 'family tree'. If you were called upon to do a restore of the database you can now make some wise choices on which timeline may contain the data most useful to the business.
Before performing a restore, reinitializing standby or replicas. Here are some useful steps to assist you in an investigation to determine what timeline has the most useful data from a business perspective:
- look at pg_controldata to see what timeline you’re database is currently on
- look at the WAL history files present in $PGDATA/pg_wal/*.history
- if necessary, restore missing *.history files from your backup repository/location
- look at when the timelines were created
- examine the contents of the WAL segment(s)
Related Articles
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read
- Easy Totals and Subtotals in Postgres with Rollup and Cube
5 min read
- A change to ResultRelInfo - A Near Miss with Postgres 17.1
8 min read