Tuple shuffling: Postgres CTEs for Moving and Deleting Table Data
Recently we published an article about some of the best sql subquery tools and we were talking about all the cool things you can do with CTEs. One thing that doesn’t get mentioned near enough is the use of CTEs to do work in your database moving things around. Did you know you can use CTEs for tuple shuffling? Using CTEs to update, delete, and insert data can be extremely efficient and safe for your Postgres database.
PostgreSQL 15 included the MERGE statement, which can be similar. There are however some cases which cannot be covered by this, or if you need to use PostgreSQL versions before MERGE was introduced, this technique may come in handy.
Deleting rows and inserting to another table
A common use case where this technique can come in handy is to move rows from one table to another in a single statement. Imagine that you have a schema with a single table and an archive table and you want to move data to the archive table from the source table when it gets to be a year inactive.
This can be accomplished via something like:
WITH
deleted AS (
DELETE FROM table_a
WHERE
last_modified < now () - interval '1 year' RETURNING *
)
INSERT INTO
archive
SELECT
*
FROM
deleted;
This straightforward approach simply returns all rows from table_a which were deleted, then inserts them into our archive table (which is assumed to have the same structure as table_a).
What happens if this transaction gets interrupted? Fortunately due to the magic of MVCC, these actions all table place in a single transaction. You can of course use explicit transaction control if you were splitting this up into multiple statements run interactively or by an app, but having it be a single statement means you are guaranteed to have this be a single transaction from the get-go.
Filtering data
Sometimes you might want to delete all of the rows, but only archive some of them; you could accomplish this by sticking the qual in the WHERE clause of the INSERT statement, for example:
WITH
deleted AS (
DELETE FROM table_a
WHERE
last_modified < now () - interval '1 year' RETURNING *
)
INSERT INTO
archive
SELECT
*
FROM
deleted
WHERE
priority = 'important';
Here we apply a filter to the rows which were returned by the DELETE statement and only archive those which were already marked as important. For a deeper dive into adding filters with LIMIT to UPDATE and DELETE, see my previous post.
More complicated examples
Diving in deeper, what if we had multiple tables that we wanted to archive from. Each had the same table structure. Imagine that we wanted to track the original record’s source table and modified the archive table to include this as the first field in the archive table.
We can use more CTE clauses and handle this still in one go:
WITH
deleted_a AS (
DELETE FROM table_a
WHERE
last_modified < now () - interval '1 year' RETURNING *
),
deleted_b AS (
DELETE FROM table_b
WHERE
last_modified < now () - interval '1 year' RETURNING *
),
deleted_c AS (
DELETE FROM table_c
WHERE
last_modified < now () - interval '1 year' RETURNING *
)
INSERT INTO
archive
SELECT
'table_a',
*
FROM
deleted_a
UNION ALL
SELECT
'table_b',
*
FROM
deleted_b
UNION ALL
SELECT
'table_c',
*
FROM
deleted_c;
Since our INSERT statement includes all of the DELETE clauses, we will be pulling in all of the rows that were deleted from each of them and inserting them into a single table.
Update
This also works for UPDATE as well. UPDATE RETURNING will return the contents of the modified row, so we could simplify some logic to handle some more complex cases in a single query, for instance:
WITH
target_accounts AS (
SELECT
id
FROM
accounts
WHERE
type = 'savings'
),
balance_update AS (
UPDATE balances
SET
amount = amount + 100
FROM
target_accounts
WHERE
account_id = target_accounts.id RETURNING *
)
INSERT INTO
awards (account_id, award)
SELECT
account_id,
'met savings goal'
FROM
balance_update
WHERE
amount >= 1000;
Partitioning
CTEs can be used to do more complicated things, like split up a table for partitioning:
WITH
source_rows AS (
DELETE FROM movies_unsorted RETURNING *
),
action_movie_rows AS (
INSERT INTO
action_movies
SELECT
*
FROM
source_rows
WHERE
category = 'action' RETURNING id
),
comedy_movie_rows AS (
INSERT INTO
comedy_movies
SELECT
*
FROM
source_rows
WHERE
category = 'comedy' RETURNING id
),
romance_movie_rows AS (
INSERT INTO
romance_movies
SELECT
*
FROM
source_rows
WHERE
category = 'romance' RETURNING id
),
horror_movie_rows AS (
INSERT INTO
horror_movies
SELECT
*
FROM
source_rows
WHERE
category = 'horror' RETURNING id
)
INSERT INTO
other_movies
SELECT
*
FROM
source_rows
WHERE
id NOT IN (
SELECT
id
FROM
action_movie_rows
UNION ALL
SELECT
id
FROM
comedy_movie_rows
UNION ALL
SELECT
id
FROM
romance_movie_rows
UNION ALL
SELECT
id
FROM
horror_movie_rows
);
With this example, we delete our source rows for all of the movie data in “movies_unsorted”, then use multiple CTE clauses to categorize the data into the appropriate movie type, inserting in the partition that corresponds to the movies type that we’ve determined with our query, with a final catch-all that both serves to provide a way to insert any non-classified data as well as force the evaluation of the underlying CTEs (so in fact perform the INSERT into the appropriate tables).
Summary
CTEs are an important part of your toolkit and can be used for data manipulations and more complex tuple routing. Being able to name individual query pieces - including data manipulating ones like INSERT, UPDATE, or DELETE - and treating as an independent tuple source unlocks a lot of power and can be a source of creativity and problem solving.
Related Articles
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read