Postgres Locking: When is it Concerning?
When using monitoring tools like PgMonitor or pganalyze, Crunchy clients will often ask me about high numbers of locks and when to worry. Like most engineering-related questions, the answer is: "it depends".
In this post, I will provide a little more information about locks, how they are used in PostgreSQL, and what things to look for to spot problems vs high usage.
PostgreSQL uses locks in all parts of its operation to serialize or share access to key data. This can come in the form of two basic types of locks: shared or exclusive.
Shared locks - the particular resource can be accessed by more than one backend/session at the same time.
Exclusive locks - the particular resource can only be accessed by a single backend/session at a time.
The same resource can have different locks taken against them with the same or differing strengths.
Every statement run in a PostgreSQL session runs in a transaction. Either one
explicitly created via transaction control statements (
or an implicit transaction created for a single statement.
When PostgreSQL takes a lock, it takes it for the duration of the transaction. It can never be explicitly unlocked except by the final termination of the transaction. One of the reasons for this is for snapshot consistency and ensuring proper dependencies are in place for existing transactions. Once the transaction is over, this doesn't matter now, so said locks can be released.
It is worth noting that PostgreSQL (and any multi-process system) uses locks internally for other accesses besides SQL-level backends and transactions.
If you're reading this far and are curious about monitoring, you are likely
already familiar with the
pg_locks views. This is a system view that exposes
the current state of the built-in lock arrays. The details about the fields
available here and documentation can vary across version;
select your PostgreSQL version from this page for details.
The documentation provides a lot of detail about this view. The important thing to know here is this is the primary way for monitoring/reviewing this system. Some of the relevant fields are:
|Column name||Data type||Description|
|granted||boolean||Whether this backend successfully acquired the lock|
|mode||text||The mode of the given lock request|
|pid||int||The backend pid that holds or requested this lock|
Of particular note is the “granted” field, a boolean which shows whether the
given lock has been granted to the backend in question. If there is an ungranted
granted = f) then this means that the backend is blocked waiting
for a lock. Until the process that successfully has the lock completes in some
way (i.e., either commits or rolls back), this process will be stuck in limbo
and will not be able to proceed.
A related system view that can be used for more information about PostgreSQL
backend processes is the venerable
pg_stat_activity view, in particular the
wait_event field. The
wait_event will show for a given backend process if it
is currently waiting for a lock, either "heavyweight" lock or a "lightweight"
lock (indicated by
wait_event_type = LWLock).
Regular lock usage
When a query accesses a table for a
SELECT statement, it takes an
AccessShare lock against that table. If a query accesses multiple tables, it
will take locks against each of these tables. Depending on your query patterns
and transaction lengths you could end up with dozens or even hundreds of
AccessShare locks per backend connection without this being indicative of an
issue. This is also a reason why just strictly looking at the count of locks in
pg_locks as a metric for issues in the database isn't necessarily useful. If
there are a high number of connections running queries or if the workload
changes (say with an application deployment), this can cause high numbers of
locks without this being an issue.
What is an issue then?
While high numbers of locks does not necessarily indicate a problem, some problems can result in high numbers of locks. For example, if a query is not running efficiently and thus takes a long time, there can be large number of backed up connections resulting in additional lock buildup as the backends wait for the resource to be freed.
An ungranted lock for any significant length of time indicates an issue and is something that should be looked into.
SELECT COUNT(*) FROM pg_locks WHERE NOT granted;
Note that depending on when this query is run, there can appear brief instances of ungranted locks. Yet if the same lock persists for a second invocation of this query this is likely to indicate a larger issue.
If you do have an ungranted lock, you will want to look at the process that currently has the lock; this is the process that would be misbehaving. To do this, you can run the following query to get the information about the specific backend and the query it is running:
SELECT pid, pg_blocking_pids(pid), wait_event, wait_event_type, query FROM pg_stat_activity WHERE backend_type = 'client backend' AND wait_event_type ~ 'Lock'
pid process will be the process that is blocked, while the
pg_blocking_pids() function will return an array of pids that are currently
blocking this process from running. (Effectively, this is a list of processes
that have a lock that the
pid backend is waiting on.) Depending on what this
process is doing, you may want to take some sort of corrective action, such as
canceling or terminating that backend. (The correct course of action here will
depend on your specific application.)
Since locks are just a normal way that PostgreSQL controls access to its resources, high locks can be expected with high usage. So whether high numbers of locks are an indication of problems can depend on what those locks are and whether any additional issues are seen in the system proper.
If IO usage is very high, you will often see the
LWLock DataRead, which can
affect multiple backends. If IO is overloaded, any processes which are trying to
read files from the disk will be in this state. So performing more IO operations
will not be able to accomplish any more reading; the IO bandwidth of a system is
finite, and if you are already at the limits of the system. Adding more requests
will only further fragment and split the resource among additional backends.
If the system is reading in high numbers of buffers or has a lot of contention
for the same buffers (say, attempting to vacuum ones used but other processes)
you could end up encountering a
BufferContent LWLock. This is a lock that is
basically seen when trying to concurrently load large numbers of buffers. There
are multiple shared locks that are used to ensure that there is not a single
lock guarding the buffer page load, but this is still a finite resource so in
times of high load you can see this show up as a blocking process. Any one lock
is likely very brief, but in periods of high load, you will see these registers
pg_stat_activity quite frequently.
Depending on your system's transaction volume and types of transactions, you
could see lots of queries with one of several SLRU locks, either on a primary or
a replica. There are several types here, including
Clients have also run into some questions about advisory locks, particularly when using a transaction-level connection pooler such as PgBouncer. The explicit advisory lock functions in PostgreSQL allow the user to access lock primitives in their application code and allow the serialization of resources at the application level, which can be particularly useful when trying to coordinate access with external systems. That said, there can be issues encountered if not using these primitives properly.
Of particular note, if the user uses
pg_advisory_lock() from application code
when they are using a database pooler, they can end up with either deadlocks or
confusing behavior due to the potential for different database sessions being
used. Since the
pg_advisory_lock() function grabs a lock for its current
database session (not the current transaction), multiple
calls could end up getting run against different backends (since PgBouncer would
use a fairly arbitrary backend for separate transactions).
Since the locks are being taken and potentially released in separate sessions (even from the same application database connection), there is no guarantee that the resource they are intending to serialize access to is being done in a consistent manner. PgBouncer specifically recommends against the usage of these session-based locking functions for just this reason.
Applications using a database pool should look at using transaction-based locks
in order to serialize these accesses; i.e.,
friends. If this is not possible, then a separate database pool in session mode
which allows the session handling to work as expected should be utilized.
pg_advisory_lock() comes with its own set of issues outside of a
database pooler. It doesn’t release the lock even if the transaction that
created it rolls back. It can take careful coordination and exception handling
on the part of the application code to use it effectively.
I hope this article has given you a little insight into what sorts of locking might be of concern at the application level. These situations are ones which may warrant investigation and/or application changes:
- Ungranted locks
- High numbers of LWLocks showing up consistently in
- Session-level advisory locks
July 1, 2022 •More by this author