PostGIS Performance: pg_stat_statements and Postgres tuning

Paul Ramsey

4 min readMore by this author

In this series, we talk about the many different ways you can speed up PostGIS. Today let’s talk about looking across the queries with pg_stat_statements and some basic tuning.

Showing Postgres query times with pg_stat_statements

A reasonable question to ask, if you are managing a system with variable performance is: “what queries on my system are running slowly?”

Fortunately, PostgreSQL includes an extension called “pg_stat_statements” that tracks query performance over time and maintains a list of high cost queries.

CREATE EXTENSION pg_stat_statements;

Now you will have to leave your database running for a while, so the extension can gather up data about the kind of queries that are run on your database.

Once it has been running for a while, you have a whole table – pg_stat_statements – that collects your query statistics. You can query it directly with SELECT * or you can write individual queries to find the slowest queries, the longest running ones, and so on.

Here is an example of the longest running 10 queries ranked by duration.

SELECT
  total_exec_time,
  mean_exec_time,
  calls,
  rows,
  query
FROM pg_stat_statements
WHERE calls > 0
ORDER BY mean_exec_time DESC
LIMIT 10;

While “pg_stat_statements” is good at finding individual queries to tune, and the most frequent cause of slow queries is just inefficient SQL or a need for indexing - see the first post in the series.

Occasionally performance issues do crop up at the system level. The most frequent culprit is memory pressure. PostgreSQL ships with conservative default settings for memory usage, and some workloads benefit from more memory.

Shared buffers

A database server looks like an infinite, accessible, reliable bucket of data. In fact, the server orchestrates data between the disk – which is permanent and slow – and the random access memory – which is volatile and fast – in order to provide the illusion of such a system.

When the balance between slow storage and fast memory is out of whack, system performance falls. When attempts to read data are not present in the fast memory (a “cache hit”), they continue on to the slow disk (a “cache miss”).

You can check the balance of your system by looking at the “cache hit ratio”.

SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) +  sum(heap_blks_read)) as ratio
FROM
  pg_statio_user_tables;

A result in the 99% is a good sign. Below 90% means that your database could be memory constrained, so increasing the “shared_buffers” parameter may help. As a general rule, “shared buffers” should be about 25% of physical RAM.

Working memory

Working memory is controlled by the “work_mem” parameter, and it controls how much memory is available for in-memory sorting, index building, and other short term processes. Unlike the “shared buffers”, which are permanent and fully allocated on startup, the “working memory” is allocated on an as-needed basis.

However, the working memory limit is applied for each database connection, so it is possible for the total working memory to radically exceed the “work_mem” value. If 1000 connections each allocate 100MB, your server will probably run out of memory.

You can speed up known memory-hungry processes, like building spatial indexes, by temporarily increasing the working memory available to your particular connection, then reduce it when the process is complete.

SET work_mem = '2GB';
CREATE INDEX roads_geom_x ON roads USING GIST (geom);
SET work_mem = '100MB';

The same principle holds for maintenance tasks, like the “VACUUM” command. You can speed up the maintenance of a large table by increasing the “maintenance_work_mem” temporarily.

SET maintenance_work_mem = '2GB';
VACUUM roads;
SET maintenance_work_mem = '128MB';

Parallelism

It is common for modern database servers to have multiple CPU cores available, but your PostgreSQL configuration may not be tuned to use them all. Postgres does have parallel query support. PostgreSQL is conservative about making use of multiple cores, because executing and coordinating multi-process queries has overheads, but in general large aggregations or scans can frequently make effective use of two to four cores at once.

Check what limits are set on your database.

SHOW max_worker_processes;

SHOW max_parallel_workers;

Setting the maximums to the number of cores on your server is good practice. In particular, don’t be afraid to reduce the number of workers if you have fewer cores – there is no benefit to be had in workers contending for cores.

Tuning Postgres basics

To wrap up:

  • Check the slowest queries with pg_stat_statements.
  • Use EXPLAIN and Indexing to experiment with improvements
  • Check inefficient memory by looking at:
    • shared buffers
    • working memory (work_mem)
    • parallelism

After you do some tuning, don’t forget to reset pg_stat_statements and check again to see if/how things have improved!