Postgres Locking: When is it Concerning?
When using monitoring tools like PgMonitor or pganalyze, Crunchy clients will often ask me about high numbers of locks and when to worry. Like most engineering-related questions, the answer is: "it depends".
In this post, I will provide a little more information about locks, how they are used in PostgreSQL, and what things to look for to spot problems vs high usage.
PostgreSQL uses locks in all parts of its operation to serialize or share access to key data. This can come in the form of two basic types of locks: shared or exclusive.
Shared locks - the particular resource can be accessed by more than one backend/session at the same time.
Exclusive locks - the particular resource can only be accessed by a single backend/session at a time.
The same resource can have different locks taken against them with the same or differing strengths.
Lock duration
Every statement run in a PostgreSQL session runs in a transaction. Either one explicitly created via transaction control statements (BEGIN
, COMMIT
, etc) or an implicit transaction created for a single statement.
When PostgreSQL takes a lock, it takes it for the duration of the transaction. It can never be explicitly unlocked except by the final termination of the transaction. One of the reasons for this is for snapshot consistency and ensuring proper dependencies are in place for existing transactions. Once the transaction is over, this doesn't matter now, so said locks can be released.
It is worth noting that PostgreSQL (and any multi-process system) uses locks internally for other accesses besides SQL-level backends and transactions.
Monitoring locks
If you're reading this far and are curious about monitoring, you are likely already familiar with the pg_locks
views. This is a system view that exposes the current state of the built-in lock arrays. The details about the fields available here and documentation can vary across version; select your PostgreSQL version from this page for details.
The documentation provides a lot of detail about this view. The important thing to know here is this is the primary way for monitoring/reviewing this system. Some of the relevant fields are:
Column name | Data type | Description |
---|---|---|
granted | boolean | Whether this backend successfully acquired the lock |
mode | text | The mode of the given lock request |
pid | int | The backend pid that holds or requested this lock |
Of particular note is the “granted” field, a boolean which shows whether the given lock has been granted to the backend in question. If there is an ungranted lock (i.e., granted = f
) then this means that the backend is blocked waiting for a lock. Until the process that successfully has the lock completes in some way (i.e., either commits or rolls back), this process will be stuck in limbo and will not be able to proceed.
A related system view that can be used for more information about PostgreSQL backend processes is the venerable pg_stat_activity
view, in particular the wait_event
field. The wait_event
will show for a given backend process if it is currently waiting for a lock, either "heavyweight" lock or a "lightweight" lock (indicated by wait_event_type = LWLock
).
Regular lock usage
When a query accesses a table for a SELECT
statement, it takes an AccessShare
lock against that table. If a query accesses multiple tables, it will take locks against each of these tables. Depending on your query patterns and transaction lengths you could end up with dozens or even hundreds of AccessShare
locks per backend connection without this being indicative of an issue. This is also a reason why just strictly looking at the count of locks in pg_locks
as a metric for issues in the database isn't necessarily useful. If there are a high number of connections running queries or if the workload changes (say with an application deployment), this can cause high numbers of locks without this being an issue.
What is an issue then?
While high numbers of locks does not necessarily indicate a problem, some problems can result in high numbers of locks. For example, if a query is not running efficiently and thus takes a long time, there can be large number of backed up connections resulting in additional lock buildup as the backends wait for the resource to be freed.
An ungranted lock for any significant length of time indicates an issue and is something that should be looked into.
SELECT COUNT(*) FROM pg_locks WHERE NOT granted;
Note that depending on when this query is run, there can appear brief instances of ungranted locks. Yet if the same lock persists for a second invocation of this query this is likely to indicate a larger issue.
Investigating more
If you do have an ungranted lock, you will want to look at the process that currently has the lock; this is the process that would be misbehaving. To do this, you can run the following query to get the information about the specific backend and the query it is running:
SELECT pid, pg_blocking_pids(pid), wait_event, wait_event_type, query
FROM pg_stat_activity
WHERE backend_type = 'client backend'
AND wait_event_type ~ 'Lock'
Here, the pid
process will be the process that is blocked, while the pg_blocking_pids()
function will return an array of pids that are currently blocking this process from running. (Effectively, this is a list of processes that have a lock that the pid
backend is waiting on.) Depending on what this process is doing, you may want to take some sort of corrective action, such as canceling or terminating that backend. (The correct course of action here will depend on your specific application.)
Non-blocking locks
Since locks are just a normal way that PostgreSQL controls access to its resources, high locks can be expected with high usage. So whether high numbers of locks are an indication of problems can depend on what those locks are and whether any additional issues are seen in the system proper.
If IO usage is very high, you will often see the LWLock DataRead
, which can affect multiple backends. If IO is overloaded, any processes which are trying to read files from the disk will be in this state. So performing more IO operations will not be able to accomplish any more reading; the IO bandwidth of a system is finite, and if you are already at the limits of the system. Adding more requests will only further fragment and split the resource among additional backends.
If the system is reading in high numbers of buffers or has a lot of contention for the same buffers (say, attempting to vacuum ones used but other processes) you could end up encountering a BufferContent LWLock
. This is a lock that is basically seen when trying to concurrently load large numbers of buffers. There are multiple shared locks that are used to ensure that there is not a single lock guarding the buffer page load, but this is still a finite resource so in times of high load you can see this show up as a blocking process. Any one lock is likely very brief, but in periods of high load, you will see these registers in pg_stat_activity
quite frequently.
Depending on your system's transaction volume and types of transactions, you could see lots of queries with one of several SLRU locks, either on a primary or a replica. There are several types here, including SubtransSLRU
and MultiXactSLRU
.
Advisory locks
Clients have also run into some questions about advisory locks, particularly when using a transaction-level connection pooler such as PgBouncer. The explicit advisory lock functions in PostgreSQL allow the user to access lock primitives in their application code and allow the serialization of resources at the application level, which can be particularly useful when trying to coordinate access with external systems. That said, there can be issues encountered if not using these primitives properly.
Of particular note, if the user uses pg_advisory_lock()
from application code when they are using a database pooler, they can end up with either deadlocks or confusing behavior due to the potential for different database sessions being used. Since the pg_advisory_lock()
function grabs a lock for its current database session (not the current transaction), multiple pg_advisory_lock()
calls could end up getting run against different backends (since PgBouncer would use a fairly arbitrary backend for separate transactions).
Since the locks are being taken and potentially released in separate sessions (even from the same application database connection), there is no guarantee that the resource they are intending to serialize access to is being done in a consistent manner. PgBouncer specifically recommends against the usage of these session-based locking functions for just this reason.
Applications using a database pool should look at using transaction-based locks in order to serialize these accesses; i.e., pg_advisory_xact_lock()
and friends. If this is not possible, then a separate database pool in session mode which allows the session handling to work as expected should be utilized.
Note that pg_advisory_lock()
comes with its own set of issues outside of a database pooler. It doesn’t release the lock even if the transaction that created it rolls back. It can take careful coordination and exception handling on the part of the application code to use it effectively.
Final Thoughts
I hope this article has given you a little insight into what sorts of locking might be of concern at the application level. These situations are ones which may warrant investigation and/or application changes:
- Ungranted locks
- High numbers of LWLocks showing up consistently in
pg_stat_activity
- Session-level advisory locks