Supporting PostgreSQL DBAs is an important part of daily life here at
Crunchy Data. I’ve recently run across a few use
cases where utility queries based on the current state of the database are
needed. A simple example could be where you have a table that is the target of
logical replication and the
id column becomes out of sync with the sequence
that generated the data. This would result in new rows having primary key
conflicts. To correct this issue, you would need to set the sequence to generate
values past the current max value in the table.
This example is part of a larger class of problems which are best solved with functionality that SQL by itself does not directly provide: Dynamic DDL. Data Definition Language (DDL) in SQL itself is notoriously non-dynamic, with strict parsing rules, predefined data types, table structures, and queries based on known and articulated columns.
So how can we bend SQL to our will and execute Dynamic DDL Postgres queries without having to manually write these queries each time? In this next installment of my Devious SQL series (see posts #1 and #2), I’ll show you some SQL approaches to get the job done.
Let us again consider a scenario where we want to explicitly provide the
RESTART value for a
sequence via a
query. This is an easy thing to express in terms of what we would like to do: we
want to reset a sequence to start after the current maximum value of the table
it is associated with.
Trying the naïve approach, we get:
ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);
ERROR: syntax error at or near "(", at character 41 STATEMENT: ALTER SEQUENCE big_table_id_seq RESTART (SELECT max(id) + 1 FROM big_table);
As we can see, this approach isn't supported by the PostgreSQL grammar, as it is expecting an actual value here, not a subquery (as nice as that would be).
So what are some approaches here?
If we are using
psql, we have a few options on how to solve this problem. One
approach is using
and first selecting the value we want into a variable, then substituting this
value into the expression we pass to psql:
-- use \gset to set a psql variable with the results of this query SELECT max(id) + 1 as big_table_max from big_table \gset -- substitute the variable in a new query ALTER SEQUENCE big_table_id_seq RESTART :big_table_max ;
In this example, we are using the
\gset command to capture the results of the
first query and store it for use later in the
psql session. We then
interpolate this variable into our expression using the
which will be passed directly to the PostgreSQL server.
Another method of utilizing
psql for dynamic SQL is constructing the query as
SELECT statement returning the statements you wish to run, then using the
\gexec command to execute the underlying queries. First let's look at making
ourselves a query that returns the statement we want, then we'll run this
SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table; SELECT 'ALTER SEQUENCE big_table_id_seq RESTART ' || max(id) + 1 as query FROM big_table \gexec
query ALTER SEQUENCE big_table_id_seq RESTART 100001 ALTER SEQUENCE
A benefit of this approach compared to the variable substitution one is that
this can work with more complex statements and multiple return values, so you
could construct queries based on arbitrary conditions and generate more than one
SQL query; the first implementation is limited to queries that return single
rows at a time. This also gives you a preview of the underlying SQL statement
that you will be running before you execute it against the server with
\gexec, so provides some level of safety if you were doing some sort of
destructive action in the query.
Not everyone uses
psql as the interface to PostgreSQL, despite its obvious
superiority :-), so are there ways to support dynamic SQL using only server-side
tools? As it so happens there are several, using basically the same approach of
plpgsql snippet to generate the query, then
EXECUTE to run the
underlying utility statement. These roughly correlate to the approaches in the
psql section above in that they work best for single or multiple dynamic
To use server-side Dynamic SQL we will need to construct our queries using
plpgsql and execute the underlying text as if we were issuing the underlying
DO $$ BEGIN EXECUTE format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table)); END $$ LANGUAGE plpgsql;
In this case we are using PostgreSQL's built-in
format() function which
substitutes arguments similar to
printf() in C-based languages. This allows us
to interpolate the subquery result we were wanting in this case, resulting in a
string that PostgreSQL can
EXECUTE and giving us the result we want.
Almost identical in function to the
DO block, we can also create a simple
plpgsql function that simply calls
EXECUTE on it input parameter like so:
CREATE OR REPLACE FUNCTION exec(raw_query text) RETURNS text AS $$ BEGIN EXECUTE raw_query; RETURN raw_query; END $$ LANGUAGE plpgsql; SELECT exec(format('ALTER SEQUENCE big_table_id_seq RESTART %s', (SELECT max(id) + 1 FROM big_table)));
CREATE FUNCTION exec ALTER SEQUENCE big_table_id_seq RESTART 100001
This may seem like a fairly pointless change compared to the previous approach, as we have basically only moved our query into a parameter that we pass in, but what it buys us is the ability to call this function against a list of queries that we construct using normal SQL, giving us the option of running each in turn.
So what type of SQL can be run in each of these sorts of approaches, and are there any restrictions in what we can run via Dynamic SQL with these methods? The main consideration about the different approaches is related to commands that need to be run outside of an explicit transaction block.
Consider if we wanted to run a
REINDEX CONCURRENTLY on all known indexes, so
we used the
exec() approach to construct a
REINDEX CONCURRENTLY statement
for all indexes in the
SELECT exec(format('REINDEX INDEX CONCURRENTLY %I', relname)) FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relkind = 'i' AND nspname = 'public'
ERROR: REINDEX CONCURRENTLY cannot be executed from a function CONTEXT: SQL statement "REINDEX INDEX CONCURRENTLY big_table_pkey" PL/pgSQL function exec(text) line 3 at EXECUTE
As you can see here, this won't work as a function due to
needing to manage its own transaction state; in PostgreSQL, functions inherently
run inside a transaction to allow the impact of a function to either completely
succeed or completely fail. (Atomicity in ACID.)
Let's try this using
SELECT format('REINDEX INDEX CONCURRENTLY %I', relname) FROM pg_class JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid WHERE relkind = 'i' AND nspname = 'public' \gexec
\gexec handling is done by
psql, the resulting statement is
effectively run at the top-level as if it appeared literally in the SQL file.
Look for a followup blog article where I go into more advanced techniques using
Dynamic SQL, particularly using the
\gexec function or
exec() itself. Until
next time, stay devious1!
Devious: longer and less direct than the most straightforward way. ↩
September 29, 2021 •More by this author