Recently we published an article about some of the best sql subquery tools and we were talking about all the cool things you can do with CTEs. One thing that doesn’t get mentioned near enough is the use of CTEs to do work in your database moving things around. Did you know you can use CTEs for tuple shuffling? Using CTEs to update, delete, and insert data can be extremely efficient and safe for your Postgres database.
PostgreSQL 15 included the MERGE statement, which can be similar. There are however some cases which cannot be covered by this, or if you need to use PostgreSQL versions before MERGE was introduced, this technique may come in handy.
A common use case where this technique can come in handy is to move rows from one table to another in a single statement. Imagine that you have a schema with a single table and an archive table and you want to move data to the archive table from the source table when it gets to be a year inactive.
This can be accomplished via something like:
WITH deleted AS ( DELETE FROM table_a WHERE last_modified < now () - interval '1 year' RETURNING * ) INSERT INTO archive SELECT * FROM deleted;
This straightforward approach simply returns all rows from table_a which were deleted, then inserts them into our archive table (which is assumed to have the same structure as table_a).
What happens if this transaction gets interrupted? Fortunately due to the magic of MVCC, these actions all table place in a single transaction. You can of course use explicit transaction control if you were splitting this up into multiple statements run interactively or by an app, but having it be a single statement means you are guaranteed to have this be a single transaction from the get-go.
Sometimes you might want to delete all of the rows, but only archive some of them; you could accomplish this by sticking the qual in the WHERE clause of the INSERT statement, for example:
WITH deleted AS ( DELETE FROM table_a WHERE last_modified < now () - interval '1 year' RETURNING * ) INSERT INTO archive SELECT * FROM deleted WHERE priority = 'important';
Here we apply a filter to the rows which were returned by the DELETE statement and only archive those which were already marked as important. For a deeper dive into adding filters with LIMIT to UPDATE and DELETE, see my previous post.
Diving in deeper, what if we had multiple tables that we wanted to archive from. Each had the same table structure. Imagine that we wanted to track the original record’s source table and modified the archive table to include this as the first field in the archive table.
We can use more CTE clauses and handle this still in one go:
WITH deleted_a AS ( DELETE FROM table_a WHERE last_modified < now () - interval '1 year' RETURNING * ), deleted_b AS ( DELETE FROM table_b WHERE last_modified < now () - interval '1 year' RETURNING * ), deleted_c AS ( DELETE FROM table_c WHERE last_modified < now () - interval '1 year' RETURNING * ) INSERT INTO archive SELECT 'table_a', * FROM deleted_a UNION ALL SELECT 'table_b', * FROM deleted_b UNION ALL SELECT 'table_c', * FROM deleted_c;
Since our INSERT statement includes all of the DELETE clauses, we will be pulling in all of the rows that were deleted from each of them and inserting them into a single table.
This also works for UPDATE as well. UPDATE RETURNING will return the contents of the modified row, so we could simplify some logic to handle some more complex cases in a single query, for instance:
WITH target_accounts AS ( SELECT id FROM accounts WHERE type = 'savings' ), balance_update AS ( UPDATE balances SET amount = amount + 100 FROM target_accounts WHERE account_id = target_accounts.id RETURNING * ) INSERT INTO awards (account_id, award) SELECT account_id, 'met savings goal' FROM balance_update WHERE amount >= 1000;
CTEs can be used to do more complicated things, like split up a table for partitioning:
WITH source_rows AS ( DELETE FROM movies_unsorted RETURNING * ), action_movie_rows AS ( INSERT INTO action_movies SELECT * FROM source_rows WHERE category = 'action' RETURNING id ), comedy_movie_rows AS ( INSERT INTO comedy_movies SELECT * FROM source_rows WHERE category = 'comedy' RETURNING id ), romance_movie_rows AS ( INSERT INTO romance_movies SELECT * FROM source_rows WHERE category = 'romance' RETURNING id ), horror_movie_rows AS ( INSERT INTO horror_movies SELECT * FROM source_rows WHERE category = 'horror' RETURNING id ) INSERT INTO other_movies SELECT * FROM source_rows WHERE id NOT IN ( SELECT id FROM action_movie_rows UNION ALL SELECT id FROM comedy_movie_rows UNION ALL SELECT id FROM romance_movie_rows UNION ALL SELECT id FROM horror_movie_rows );
With this example, we delete our source rows for all of the movie data in “movies_unsorted”, then use multiple CTE clauses to categorize the data into the appropriate movie type, inserting in the partition that corresponds to the movies type that we’ve determined with our query, with a final catch-all that both serves to provide a way to insert any non-classified data as well as force the evaluation of the underlying CTEs (so in fact perform the INSERT into the appropriate tables).
CTEs are an important part of your toolkit and can be used for data manipulations and more complex tuple routing. Being able to name individual query pieces - including data manipulating ones like INSERT, UPDATE, or DELETE - and treating as an independent tuple source unlocks a lot of power and can be a source of creativity and problem solving.
November 2, 2023 •More by this author