When applications store and retrieve data from databases, it is important to monitor which queries are performing well, and which are not. Because data is being added and changed, and applications evolve as needs change, this process should be frequent and constant.
pg_stat_statements is a PostgreSQL extension that tracks query statistics. It is
extremely useful to find tuning opportunities in your database schema or queries.
To use it, you first need to enable it (no need to do so on the PostgreSQL window here)
shared_preload_libraries = 'pg_stat_statements'in your
CREATE EXTENSION pg_stat_statements;
pg_stat_statements carries a lot of information. If you're running these in
PostgreSQL's standard command-line client, first turn on extended display of results.
Simplest query you can run, with a lot of information, from every database where the extension has been enabled:
SELECT * FROM pg_stat_statements;
When you're trying to identify bad-performing queries, you're often looking for queries
that are run against a single database, so let's join against
pg_database so we can
filter. The following query gets only a few columns, shows the database name,
the top 5 queries with highest execution time, for every database:
SELECT d.datname, s.total_exec_time, s.calls, s.rows, s.query FROM pg_stat_statements s JOIN pg_database d ON (s.dbid = d.oid) ORDER BY total_exec_time DESC LIMIT 5;
What if we need to find queries that are using a lot of time of the database?
total_exec_time column tells us the total time spent executing the statement,
in milliseconds, which is very precise. Let's round that to 2 decimal places.
In this query we ask for the total execution time, the number of calls and rows, calculate the average time and the "percentage cpu" usage for each query.
SELECT d.datname, round(s.total_exec_time::numeric, 2) AS total_exec_time, s.calls, s.rows, round(s.total_exec_time::numeric / calls, 2) AS avg_time, round((100 * s.total_exec_time / sum(s.total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_cpu, substring(s.query, 1, 50) AS short_query FROM pg_stat_statements s JOIN pg_database d ON (s.dbid = d.oid) ORDER BY percentage_cpu DESC LIMIT 5;
With a list of the worst-performing queries in hand, we can dig deeper into each of them by running them individually through
EXPLAIN ANALYZE. That will show us the execution plan that Postgres executed for the query, so we analyze where the query could use improvements.
What's the average execution time of ALL queries in ALL databases?
SELECT (sum(total_exec_time) / sum(calls))::numeric(6,3) AS avg_execution_time FROM pg_stat_statements;
Let's find the top 5 queries that write the most to
shared_buffers, the amount of memory
PostgreSQL uses for shared memory. That means these queries invalidate a lot of shared memory blocks, and could perhaps use some improvements.
SELECT query, shared_blks_dirtied FROM pg_stat_statements WHERE shared_blks_dirtied > 0 ORDER BY 2 desc LIMIT 5;
pg_stat_statements can tell us about queries that are getting executed many times, and taking a long time, among other things.
If we combine that information with statistics from other tables, such as
pg_stat_user_indexes we can also find more information to help us improve our performance.
Often bad-performing queries happen because they're missing an index to help the database find the right rows to return. This query will tell us which tables might be missing an index
SELECT relname, seq_scan - idx_scan AS too_much_seq, CASE WHEN seq_scan - idx_scan > 0 THEN 'Missing Index?' ELSE 'OK' END, pg_relation_size(relid) AS rel_size, seq_scan, idx_scan FROM pg_stat_user_tables WHERE schemaname <> 'information_schema' AND schemaname NOT LIKE 'pg%' ORDER BY too_much_seq DESC;
relname | too_much_seq | case | rel_size | seq_scan | idx_scan -------------+--------------+----------------+----------+----------+---------- dependents | 41 | Missing Index? | 139264 | 41 | 0 countries | -3 | OK | 8192 | 1 | 4 locations | -3 | OK | 8192 | 1 | 4 regions | -24 | OK | 8192 | 1 | 25 departments | -927 | OK | 8192 | 81 | 1008 employees | -3469 | OK | 106496 | 161 | 3630
The dependents table has had 41 more sequential scans than index scans, and may need an index, based on the queries that have been recently executed.
It is important to understand that indexes do have a cost, and slow down writes to a table because the database has to update the index whenever data is added or changed. So one should NOT just add indexes to all columns of a table.
pg_stat_statements is a powerful way to find way to find useful information about
queries and their effect on database performance. Combined with
EXPLAIN and data from another view,
pg_stat_activity, which we'll cover in another tutorial, you can find important information to help you better manage your applications.
Read some more on this blog post. Full documentation is at https://www.postgresql.org/docs/current/pgstatstatements.html.