One PID to Lock Them All: Finding the Source of...

On the Customer Success Engineering team at Crunchy Bridge, we run across customers with lock issues on their Postgres database from time to time. Locks can have a cascading effect on queries. If one process is locking a table, then a query can be waiting on the process before it, and the process before that one. Major lock issues can quickly take down an entire production Postgres instance or application.

In this post let’s look at why locks happen, and more importantly how to get to the bottom of a lock issue and the one process blocking everything else. That one process that blocks them all! Once you find the source of the lock, I’ll give you the options for terminating the process that created all your troubles in the first place.

Finding the source of the lock

Often you won’t immediately know that you have a lock issue. If something is off, queries aren’t returning, or your application is slow, finding statements blocked by locks is a great place to start.

1. Find processes that are waiting

Take a look at the pg_stat_activity view for processes that are active but have a wait_event or wait_event_type that are non-NULL:

SELECT
  pid,
  datname,
  usename,
  application_name,
  client_addr,
  client_port,
  to_char (now (), 'YYYY-MM-DD HH24:MI:SS') as now,
  to_char (now () - xact_start, 'DD HH24:MI:SS MS') as xact_time,
  to_char (now () - query_start, 'DD HH24:MI:SS MS') as query_time,
  state,
  to_char (now () - state_change, 'DD HH24:MI:SS MS') as state_time,
  wait_event,
  wait_event_type,
  left (query, 40)
FROM
  pg_stat_activity
WHERE
  state != 'idle'
  and pid != pg_backend_pid ()
ORDER BY
  query_time desc;

If a connection is active and waiting on a lock, then the wait_event and wait_event_type columns will be non-NULL. If that's the case (and it stays that way after a couple of runs of the query to ensure that you didn't just catch a short lock wait), record that affected PID. Here is a very simple example where I ran an update in a transaction, then in a different session added a column to the same table. The ALTER TABLE in this case will not proceed until the transaction from the prior thread has been committed or rolled back. Here are the results - note the PID 295998 that is "active" but has wait_event=relation and wait_event_type=Lock

  pid   | datname  | usename  | application_name |   client_addr   | client_port |         now         |    xact_time    |   query_time    |        state        |   state_time    | wait_event | wait_event_type |                   left
--------+----------+----------+------------------+-----------------+-------------+---------------------+-----------------+-----------------+---------------------+-----------------+------------+-----------------+------------------------------------------
 295995 | postgres | postgres | psql             | 149.42.105.253 |       49327 | 2023-11-09 20:41:10 | 00 00:02:11 535 | 00 00:02:01 755 | idle in transaction | 00 00:02:01 755 | ClientRead | Client          | RELEASE pg_psql_temporary_savepoint
 295998 | postgres | postgres | psql             | 149.42.105.253 |       49344 | 2023-11-09 20:41:10 | 00 00:01:55 550 | 00 00:01:01 138 | active              | 00 00:01:01 138 | relation   | Lock            | alter table sampledata add column data02
(2 rows)

2. Find which PID is locking the table

Now we know that the PID (295998) is awaiting a lock on a relation (table), but we don’t know what process currently holds the lock on which it is waiting. To find it, we start by querying pg_locks using the ID of the awaiting process:

SELECT
  *
FROM
  pg_locks
WHERE
  pid = 295998
  AND granted IS NOT true;

Here’s the result of that query:

locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid   |        mode         | granted | fastpath |          waitstart
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+---------------------+---------+----------+------------------------------
 relation |        5 |    16501 |      |       |            |               |         |       |          | 6/6743             | 295998 | AccessExclusiveLock | f       | f        | 2023-11-09 20:40:08.98843+00
(1 row)

The locktype column shows which of the other columns describe what Postgres is waiting on. In this example, locktype is relation, so we look to the relation column to see the OID of the relation (16501) where the blocking process has an active lock.

3. Find the process with the existing lock

Now that we know which object is locked, we can once again query pg_locks using the relation OID to see what is holding the current lock(s):

SELECT
  *
FROM
  pg_locks
WHERE
  relation = 16501
  AND granted IS true;

Here is the result:

locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction |  pid   |       mode       | granted | fastpath | waitstart
----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+--------+------------------+---------+----------+-----------
 relation |        5 |    16501 |      |       |            |               |         |       |          | 3/243227           | 295995 | RowExclusiveLock | t       | f        |
(1 row)

This shows that PID 295995 is the process holding the lock.

4. Find what that blocking process is doing

Now that we know which process has been granted the lock, we can go back to pg_stat_activity to see what that PID is doing:

SELECT
  pid,
  state,
  wait_event,
  wait_event_type,
  left (query, 40)
FROM
  pg_stat_activity
WHERE
  pid = 295995;

Here is the result:

pid   |        state        | wait_event | wait_event_type |                left
--------+---------------------+------------+-----------------+-------------------------------------
 295995 | idle in transaction | ClientRead | Client          | RELEASE pg_psql_temporary_savepoint

The last column is showing the last statement executed by that session, which in this case was the savepoint release after an update, but in most cases it will show an active transaction.

One lock to rule them all

The above statements are pretty straightforward once you know what you are looking for, but they can also be combined into a single statement for a general blocking / blocked query. The Postgres wiki has some good combined versions.

Often times you might find that the blocked statement is blocked by another (and another, and another still…). In those cases, it is still possible to trace all the way up to the One PID that blocks all the rest, but that can be an arduous, unexpected journey. For those cases, a colleague here at Crunchy Data, Brian Pace, wrote a query that helps to show locks waiting on other locks, rolling up to the PID holding the initial lock:

WITH sos AS (
	SELECT array_cat(array_agg(pid),
           array_agg((pg_blocking_pids(pid))[array_length(pg_blocking_pids(pid),1)])) pids
	FROM pg_locks
	WHERE NOT granted
)
SELECT a.pid, a.usename, a.datname, a.state,
	   a.wait_event_type || ': ' || a.wait_event AS wait_event,
       current_timestamp-a.state_change time_in_state,
       current_timestamp-a.xact_start time_in_xact,
       l.relation::regclass relname,
       l.locktype, l.mode, l.page, l.tuple,
       pg_blocking_pids(l.pid) blocking_pids,
       (pg_blocking_pids(l.pid))[array_length(pg_blocking_pids(l.pid),1)] last_session,
       coalesce((pg_blocking_pids(l.pid))[1]||'.'||coalesce(case when locktype='transactionid' then 1 else array_length(pg_blocking_pids(l.pid),1)+1 end,0),a.pid||'.0') lock_depth,
       a.query
FROM pg_stat_activity a
     JOIN sos s on (a.pid = any(s.pids))
     LEFT OUTER JOIN pg_locks l on (a.pid = l.pid and not l.granted)
ORDER BY lock_depth;

Example output from that statement:

pid   |   usename   | datname  |        state        |     wait_event      |  time_in_state  |  time_in_xact   |  relname   |   locktype    |        mode         | page | tuple |     blocking_pids      | last_session | lock_depth |                       query
--------+-------------+----------+---------------------+---------------------+-----------------+-----------------+------------+---------------+---------------------+------+-------+------------------------+--------------+------------+----------------------------------------------------
 879401 | application | postgres | idle in transaction | Client: ClientRead  | 00:29:53.512147 | 00:30:01.31748  |            |               |                     |      |       |                        |              | 879401.0   | select * from sampledata where id=101 for update;
 880275 | application | postgres | active              | Lock: transactionid | 00:01:00.342763 | 00:01:00.459375 |            | transactionid | ShareLock           |      |       | {879401}               |       879401 | 879401.1   | update sampledata set data = 'abc' where id = 101;
 880204 | application | postgres | active              | Lock: relation      | 00:00:29.722705 | 00:00:29.722707 | sampledata | relation      | AccessExclusiveLock |      |       | {879401,880275,879488} |       879488 | 879401.4   | alter table sampledata add column data03 integer;
 880187 | application | postgres | active              | Lock: relation      | 00:00:03.580716 | 00:00:03.580718 | sampledata | relation      | RowExclusiveLock    |      |       | {880204}               |       880204 | 880204.2   | update sampledata set data = 'abc' where id = 103;
 879527 | application | postgres | active              | Lock: relation      | 00:00:14.974433 | 00:28:32.80346  | sampledata | relation      | RowExclusiveLock    |      |       | {880204}               |       880204 | 880204.2   | update sampledata set data = 'abc' where id = 102;
 879488 | application | postgres | active              | Lock: tuple         | 00:00:41.35361  | 00:00:41.47118  | sampledata | tuple         | ExclusiveLock       |    2 |    21 | {880275}               |       880275 | 880275.2   | update sampledata set data = 'def' where id = 101;
(6 rows)

In this manufactured example we have:

879401 - the “idle in transaction” PID - This is a SELECT... FOR UPDATE within a transaction. Its blocking_pids field is blank because it’s not blocked by any other process. This is the process in this example that is blocking everything else.

880275 - Attempting to update the same id=101 - It’s blocked until the FOR UPDATE is completed.

879488 - Again attempting to update the same id=101 - It can’t execute until the process blocking it completes. It’s waiting on 880275 since it came in afterwards. If 880275 is canceled, it will just roll up to the next blocker, 879401.

880204 - Here added in an ALTER TABLE - since it takes an access exclusive lock, note it’s blocking_pids shows all three of the prior statements - it won’t be able to execute until each of those are out of the way

879527 - Blocked by the ALTER TABLE since it requires an AccessExclusiveLock. Note that it’s still blocked, even though it’s a different row (id=102).

880187 - Blocked also by ALTER TABLE. They are at the same lock_depth since they are both blocked by the same thing, but not by each other.

Ending the process holding the lock

Ok, now we’ve found the PID at the top of the tree, that one locking holding the key to the rest of our locks. Fortunately, as Postgres wizards, we do possess the craft to unmake the lock.

Commit

If the statement is showing as idle in transaction it is possible that you have a non-committed transaction open that started with a BEGIN statement. In that case you can commit with:

COMMIT;

Rollback

You may have performed some unintended updates, or run into an error. In that case you can abort the transaction and rollback any changes already made with:

ROLLBACK;

Cancel the PID

If this wasn't an transaction you initiated, in most cases you can cancel the running query with:

SELECT pg_cancel_backend(PID);

Terminate the backend connection and process

If the cancel statement above doesn’t work, you can cast the lock back into the fiery chasm from whence it came by executing a terminate back end statement. This will end the process and its associated database connection.

SELECT pg_terminate_backend(PID);

Why did Postgres lock?

Postgres’ multi-version concurrently control system is incredibly advanced and by and large is letting you query, update, and insert rows without locking tables. There are two main kinds of locks:

Shared locks - the resource can be accessed by more than one backend/session at the same time
Exclusive locks - the resource can only be accessed by a single backend/session at a time

The lock type that generally gets us into trouble and blocks other queries and processes are exclusive locks. If you want an overview, see David’s post, Postgres Locking: When Is It Concerning? There are probably hundreds of ways to put an exclusive lock on a table, but these are the most common ones we see with our customers.

Alter Table

By far the most common event I see to take an exclusive and detrimental lock is an ALTER TABLEcommand, which can be issued to the database directly or in some cases via the application’s ORM while running migrations. The ALTER TABLE itself takes an ACCESS EXCLUSIVE lock (see ALTER TABLE docs) which pretty much blocks every other process on that table.

ORM framework

ORM frameworks can hide circular dependencies that produce deadlocks. An error on the application side, where other operations run into errors while being executed within the transaction scope, can cause locks and result in future transactions taking a long time to complete.

Create index

Creating indexes can lock tables if you’re not using CREATE INDEX CONCURRENTLY.

Vacuum

VACUUM FULL will take out an ACCESS EXCLUSIVE lock against a table, so should be used only in rare cases.

Other

The Postgres documentation has a table showing the different lock modes, how they might block each other, and some examples of statement types that result in those locks.

Getting proactive about locks

Let’s look at a few tips for managing locking in the future.

Logging lock_waits

You can log any time your query is waiting on a lock by turning on log_lock_waits. Lock_waits in your logs can be a good indicator that processes are being contentious. There is virtually no overhead on enabling this and it’s very safe for production databases. This is set to “on” by default on Crunchy Bridge clusters:

log_lock_waits = on

Set a lock timeout

We generally recommend clients set a lock_timeout within a session so that it will cancel the transaction and relinquish any locks it was holding after a certain period of time. This helps to prevent other processes from getting caught up behind them in a chain.

ALTER SYSTEM SET lock_timeout = '10s';

Summary

Find processes waiting on locks in pg_stat_activity by looking for processes that are active but have a wait_event or wait_event_type that are non-NULL.
Use this query to find the source of the lock (seriously save this query somewhere, you might need it someday).

WITH sos AS (
	SELECT array_cat(array_agg(pid),
           array_agg((pg_blocking_pids(pid))[array_length(pg_blocking_pids(pid),1)])) pids
	FROM pg_locks
	WHERE NOT granted
)
SELECT a.pid, a.usename, a.datname, a.state,
	   a.wait_event_type || ': ' || a.wait_event AS wait_event,
       current_timestamp-a.state_change time_in_state,
       current_timestamp-a.xact_start time_in_xact,
       l.relation::regclass relname,
       l.locktype, l.mode, l.page, l.tuple,
       pg_blocking_pids(l.pid) blocking_pids,
       (pg_blocking_pids(l.pid))[array_length(pg_blocking_pids(l.pid),1)] last_session,
       coalesce((pg_blocking_pids(l.pid))[1]||'.'||coalesce(case when locktype='transactionid' then 1 else array_length(pg_blocking_pids(l.pid),1)+1 end,0),a.pid||'.0') lock_depth,
       a.query
FROM pg_stat_activity a
     JOIN sos s on (a.pid = any(s.pids))
     LEFT OUTER JOIN pg_locks l on (a.pid = l.pid and not l.granted)
ORDER BY lock_depth;

End the lock by canceling the pid or issuing a COMMIT or ROLLBACK of the process that’s holding the lock and blocking the other processes
Be careful with ALTER TABLE commands, CREATE INDEX without CONCURRENTLY part, or runaway processes from your ORM that may be holding exclusive locks and blocking general database processing.
It can be a good idea to set a lock_timeout and it is generally a good idea to log lock waits if you’re doing proactive logging to keep track of ongoing problems.

Thanks to my colleague Brian Pace for the great cascading locks query.

Latest Articles

One PID to Lock Them All: Finding the Source of the Lock in Postgres