Postgres Tips & Tricks

From The Crunchy Data Team

Just a few helpful tips how to use Postgres to it's fullest. Have one you'd like to share or something is unclear? Share with us on twitter @crunchydata

locks

Find processes that are idle and possibly blocked by a lock

This query looks at the pg_stat_activity view for processes that are active but have a wait_event or wait_event_type that are non-NULL.


SELECT
		pid,
		datname,
		usename,
		application_name,
		client_addr,
		client_port,
		to_char (now (), 'YYYY-MM-DD HH24:MI:SS') as now,
		to_char (now () - xact_start, 'DD HH24:MI:SS MS') as xact_time,
		to_char (now () - query_start, 'DD HH24:MI:SS MS') as query_time,
		state,
		to_char (now () - state_change, 'DD HH24:MI:SS MS') as state_time,
		wait_event,
		wait_event_type,
		left (query, 40)
	  FROM
		pg_stat_activity
	  WHERE
		state != 'idle'
		and pid != pg_backend_pid ()
	  ORDER BY
		query_time desc;

Find processes that have a wait event, rolling up to the PID holding the initial lock.

This query looks at at the pg_stat_activity and pg_locks view showing the pid, state, wait_event, and lock mode, as well as blocking pids.


WITH sos AS (
			SELECT array_cat(array_agg(pid),
				   array_agg((pg_blocking_pids(pid))[array_length(pg_blocking_pids(pid),1)])) pids
			FROM pg_locks
			WHERE NOT granted
		)
		SELECT a.pid, a.usename, a.datname, a.state,
			   a.wait_event_type || ': ' || a.wait_event AS wait_event,
			   current_timestamp-a.state_change time_in_state,
			   current_timestamp-a.xact_start time_in_xact,
			   l.relation::regclass relname,
			   l.locktype, l.mode, l.page, l.tuple,
			   pg_blocking_pids(l.pid) blocking_pids,
			   (pg_blocking_pids(l.pid))[array_length(pg_blocking_pids(l.pid),1)] last_session,
			   coalesce((pg_blocking_pids(l.pid))[1]||'.'||coalesce(case when locktype='transactionid' then 1 else array_length(pg_blocking_pids(l.pid),1)+1 end,0),a.pid||'.0') lock_depth,
			   a.query
		FROM pg_stat_activity a
			 JOIN sos s on (a.pid = any(s.pids))
			 LEFT OUTER JOIN pg_locks l on (a.pid = l.pid and not l.granted)
		ORDER BY lock_depth;

Set a lock timeout

It can be a good idea to set a lock_timeout within a session so that it will cancel the transaction and relinquish any locks it was holding after a certain period of time.


ALTER SYSTEM SET lock_timeout = '10s';

Logging

Log slow queries by setting log_min_duration_statement


ALTER database postgres SET log_min_duration_statement = '250ms';

Control which statement types get logged

Control the types of statements that are logged for your database.


ALTER DATABASE postgres SET log_statement = 'all';

Valid values include all, ddl, none, mod

Log when waiting on a lock

Log when database is waiting on a lock.


ALTER DATABASE postgres SET log_lock_waits = 'on';

performance

Use statement timeouts to control runaway queries

Setting a statement timeout prevents queries from running longer than the specified time. You can set a statement timeout on the database, user, or session level. We recommend you set a global timeout on Postgres and then override that one specific users or sessions that need a longer allowed time to run.


ALTER DATABASE mydatabase SET statement_timeout = '60s';

Use pg_stat_statements to find the queries and processes that use the most resources


SELECT
	total_exec_time,
	mean_exec_time as avg_ms,
	calls,
	query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Monitor connections in Postgres

This query will provide the number of connection based on type.


SELECT count(*),
	   state
FROM pg_stat_activity
GROUP BY state;

If you see idle connections is above 20, it is recommended to explore using PgBouncer.

Query size of specific table

Will give you the size of the specific relation you pass in.


SELECT pg_relation_size('table_name');

-- For prettier formatting you can wrap with:

SELECT pg_size_pretty(pg_relation_size('table_name'));

Query all relation sizes

Will report on all table sizes in descending order


SELECT relname AS relation,
       pg_size_pretty (
         pg_total_relation_size (C .oid)
       ) AS total_size
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C .relnamespace)
WHERE nspname NOT IN (
        'pg_catalog',
        'information_schema'
      )
  AND C .relkind <> 'i'
  AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size (C .oid) DESC

Check for unused indexes

Will return the unused indexes in descending order of size. Keep in mind you want to also check replicas before dropping indexes.


SELECT schemaname || '.' || relname AS table,
       indexrelname AS index,
       pg_size_pretty(pg_relation_size(i.indexrelid)) AS "index size",
       idx_scan as "index scans"
FROM pg_stat_user_indexes ui
JOIN pg_index i ON ui.indexrelid = i.indexrelid
WHERE NOT indisunique
  AND idx_scan < 50
  AND pg_relation_size(relid) > 5 * 8192
ORDER BY 
  pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST,
  pg_relation_size(i.indexrelid) DESC;

Get approximate counts for a table

Will return the approximate count for a table based on PostgreSQL internal statistics. Useful for large tables where performing a `SELECT count(*)` is costly on performance.


SELECT reltuples::numeric as count
FROM pg_class
WHERE relname='table_name';

Non-blocking index creation

Adding `CONCURRENTLY` during index creation, while not permitted in a transaction, will not hold a lock on the table while creating your index.


CREATE INDEX CONCURRENTLY foobar ON foo (bar);

Psql

Automatically log query time in psql

Will automatically print the time it took to run a query from within psql. *Of note this is the round trip time not simply query execution time.*


\timing

You can save this in your `.psqlrc` to be a default setting

Autoformat query results in psql

Will automatically reorganize the query output based on your terminal window for better readability.


\x auto

You can save this in your `.psqlrc` to be a default setting

Edit your psql queries in editor of your choice

Will automatically open your last run query in your default `$EDITOR`. When you save and close it will execute that query.

\e

Set a value for nulls

Will render the nulls as whatever character you specify. Handy for easier parsing of nulls vs. blank text.


\pset null 👻

You can save this in your `.psqlrc` to be a default setting

Save your query history per database locally

Will automatically save a history file for each **DBNAME**.


\set HISTFILE ~/.psql_history- :DBNAME

You can save this in your `.psqlrc` to be a default setting

Show queries issued by internal psql commands

Add "-E" (or --echo-hidden) option to psql in the command line. This option will display queries that internal psql commands generate (like "\dt mytable"). This is a cool way to learn more about system catalogs, or reuse queries issued by psql in your own tool.


psql -E

Get data back, and only the data

Add "-qtA" options to psql in the command line. Those options will have psql run in quiet mode ("-q"), return tuples only ("-t") in an unaligned fashion ("-A"). Combined with "-c" option to send a single query, it can be useful for your scripts if you want the data and only that back from Postgres. Returns one line per row.


psql -qtA

Get results as an HTML table

Add "-qtH" options to psql in the command line. Those options will have psql run in quiet mode ("-q"), return tuples only ("-t") in an HTML table ("-H"). Combined with "-c" option to send a single query, can be a fast way to embed the result of a query in an HTML page.


psql -qtH

Search previous queries with Ctrl + R

Ctrl + R will start a search session and you can start typing part of the query or command to find and run it again. If you tag specific queries with a comment, this can help with searching later.


(reverse-i-search)

Clear your psql screen

Will clear your screen in current psql session


\! clear

Continually run a query with watch

Will automatically run the last query every 2 seconds and display the output. You can also specify the query that will run after watch as well.


\watch

Show non-default configs

Will list the parameter and value for Postgres configuration parameters that have been changes from the default parameters.


\dconfig

Rollback to previous statement on error when in interactive mode

When you encounter an error when in interactive mode this will automatically rollback to just before the previous command, allowing you to continue working as you would expect.


\set ON_ERROR_ROLLBACK interactive

Export a CSV from directly in psql

When providing the `--csv` value with a query, this command will run the specific query and return CSV to STDOUT.


psql <connection-string> --csv -c 'select * from test;'

Run a query from a file in psql

Will execute the specified file when inside psql.


\i filename

Provide clean border within psql

Will give you a border around the output of your queries when in psql.


\pset border 2

You can save this in your `.psqlrc` to be a default setting

Save queries in psqlrc

Add these sample queries to psqlrc for long running queries, cache hit ratio, unused_indexes, and table sizes. Then to execute inside psql use :long_running, :cache_hit, :unused_indexes, :table_sizes.


\set long_running 'SELECT pid, now() - pg_stat_activity.xact_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.xact_start) > interval ''5 minutes'' ORDER by 2 DESC;'
\set cache_hit 'SELECT ''index hit rate'' AS name, (sum(idx_blks_hit)) / nullif(sum(idx_blks_hit + idx_blks_read),0) AS ratio FROM pg_statio_user_indexes UNION ALL SELECT ''table hit rate'' AS name, sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read),0) AS ratio FROM pg_statio_user_tables;'
\set unused_indexes 'SELECT schemaname || ''.'' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, idx_scan as index_scans FROM pg_stat_user_indexes ui JOIN pg_index i ON ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192 ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST, pg_relation_size(i.indexrelid) DESC;'
\set table_sizes 'SELECT c.relname AS name, pg_size_pretty(pg_table_size(c.oid)) AS size FROM pg_class c LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace) WHERE n.nspname NOT IN (''pg_catalog'', ''information_schema'') AND n.nspname !~ ''^pg_toast'' AND c.relkind=''r'' ORDER BY pg_table_size(c.oid) DESC;'

Set linestyle to unicode

Changes the linestyle to unicode, which when combined with above tip leads to much cleaner formatting


\pset linestyle unicode

You can save this in your `.psqlrc` to be a default setting

SQL

Look for skewed data

This query looks in pg_statistics to find cases where certain values or attributes are a larger percentage of column values. This retrieves table name, column name, and the attributes with their percentage of common value. This formats results in psql, but can also be run as a query.


SELECT starelid::regclass AS table_name,attname AS column_name, 
(SELECT string_agg('',format(E''%s': %s%%
', v,ROUND(n::numeric*100, 2)))
FROM unnest(stanumbers1,stavalues1::text::text[])nvs(n,v)) pcts
FROM pg_statistic
JOIN pg_attribute ON attrelid=starelid
AND attnum = staattnum
JOIN pg_class ON attrelid = pg_class.oid
JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
WHERE nspname NOT LIKE 'pg_%' AND nspname <> 'information_schema'
\x\g\x

Replace nulls with other value

Coalesce will use the value and if the value is null display your specified string.


SELECT id, 
       coalesce(ip, 'no IP') 
FROM logs;

You can supply two columns as well prior to your replacement value and the function will use first not null value.

Import Schema with Mapping a Foreign Data Wrapper (FDW)

Import foreign schema creates foreign tables representing those from the foreign server.


IMPORT FOREIGN SCHEMA "public";

You can IMPORT FOREIGN SCHEMA when mapping a foreign data wrapper to save you from building a new one

Generate data with generate_series

Generates values from the start to the end values supplied based on the interval. Values can be numbers or timestamps. Can be used in a FROM or JOIN clause or CTE. Commonly used when building charts and reports that require all dates to be filled.


SELECT * FROM
generate_series(now() - '3 month'::interval, now(), '1 day');

Round dates with date_trunc

Will truncate the date to the specified level of precision. Some example precision levels include: month, week, day, hour, minute.


SELECT date_trunc('day', now());

Perform time math with intervals

You can add or subtract specific amounts of time of a timestamp by casting the value you want as an interval.


SELECT now() - '1 month'::interval;

Make your session rest a bit

This function will make your session sleep for 2.5 seconds. Useful in any testing tool executing a script in a given loop where you want to pause a bit between iterations, as an example.


select pg_sleep(2.5);

utility

Dump or archive a single table

This will create a .sql file for your table named "test", which can be restored.


pg_dump -t my_table > test.sql

Are you close to overflowing an integer?

Identify all auto-incrementing columns, which SEQUENCE object it owns, data types of the column and SEQUENCE object, and percent until the sequence value exceeds the sequence or column data type.


SELECT
	seqs.relname AS sequence,
	format_type(s.seqtypid, NULL) sequence_datatype,
CONCAT(tbls.relname, '.', attrs.attname) AS owned_by,
	format_type(attrs.atttypid, atttypmod) AS column_datatype,
	pg_sequence_last_value(seqs.oid::regclass) AS last_sequence_value,
TO_CHAR((
	CASE WHEN format_type(s.seqtypid, NULL) = 'smallint' THEN
		(pg_sequence_last_value(seqs.relname::regclass) / 32767::float)
	WHEN format_type(s.seqtypid, NULL) = 'integer' THEN
		(pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float)
	WHEN format_type(s.seqtypid, NULL) = 'bigint' THEN
		(pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float)
	END) * 100, 'fm9999999999999999999990D00%') AS sequence_percent,
TO_CHAR((
	CASE WHEN format_type(attrs.atttypid, NULL) = 'smallint' THEN
		(pg_sequence_last_value(seqs.relname::regclass) / 32767::float)
	WHEN format_type(attrs.atttypid, NULL) = 'integer' THEN
		(pg_sequence_last_value(seqs.relname::regclass) / 2147483647::float)
	WHEN format_type(attrs.atttypid, NULL) = 'bigint' THEN
		(pg_sequence_last_value(seqs.relname::regclass) / 9223372036854775807::float)
	END) * 100, 'fm9999999999999999999990D00%') AS column_percent
FROM
	pg_depend d
	JOIN pg_class AS seqs ON seqs.relkind = 'S'
		AND seqs.oid = d.objid
	JOIN pg_class AS tbls ON tbls.relkind = 'r'
		AND tbls.oid = d.refobjid
	JOIN pg_attribute AS attrs ON attrs.attrelid = d.refobjid
		AND attrs.attnum = d.refobjsubid
	JOIN pg_sequence s ON s.seqrelid = seqs.oid
WHERE
	d.deptype = 'a'
	AND d.classid = 1259;

Restore an dump/archive of a single table

This will create a database and associated data from a .sql file. Must be run from inside the folder with the test.sql file.


psql -f test.sql

Create read only user

Once already creating a specific user role, you can user the `pg_read_all_data` to grant read only access to all tables.


GRANT pg_read_all_data TO username;

View tables using TOAST

View what source tables have TOAST (the oversized attribute storage technique) tables with data and how big the TOAST tables are.


SELECT
		c.relname AS source_table_name,
		c.relpages AS source_table_number_of_pages,
		c.reltuples AS source_table_number_of_tuples,
		c.reltoastrelid AS toast_table_oid,
		t.relname AS toast_table_name,
		t.relpages AS toast_table_number_of_pages,
		t.reltuples AS toast_table_number_of_tuples
	 FROM
		pg_class c
		JOIN pg_class t ON c.reltoastrelid = t.oid
	 WHERE
		t.relpages > 0;

Find active and idle users

View roles, state, and connection count.


SELECT usename, state, count(1) cnt
FROM pg_stat_activity
WHERE usename is not null
GROUP BY usename, state
ORDER BY usename, state;

Terminate backends of a particular user ("test")

This query will kill every backend user "test" is connected to.


WITH pids AS (
  SELECT pid
  FROM pg_stat_activity
  WHERE usename='test'
)

SELECT pg_terminate_backend(pid)
FROM pids;

Cancel every running SQL commands from a particular user ("test")

This query will cancel every running query issued by the particular user "test".


WITH pids AS (
  SELECT pid 
  FROM pg_stat_activity 
  WHERE username='test'
)

SELECT pg_cancel_backend(pid)
FROM pids;