Have you ever needed to reference a table in a subquery? With something like the following, where we try to reference the
accounts table in the subquery:
SELECT accounts.id, accounts.name, last_purchase.* FROM accounts INNER JOIN (SELECT * FROM purchases WHERE account_id = accounts.id ORDER BY created_at DESC LIMIT 1 ) AS last_purchase ON true;
But, you had the following error:
ERROR: invalid reference to FROM-clause entry for table "accounts" LINE 9: WHERE account_id = accounts.id ^ HINT: There is an entry for table "accounts", but it cannot be referenced from this part of the query.
When you hear a reference to
LATERAL, it is typically phrased as a
LATERAL JOIN, but
LATERAL is just a keyword that allows us to reference tables from the top-level query in a sub-query. The following is an example of
INNER JOIN using the
LATERAL keyword to find the latest purchase for all accounts (the only difference in this query and the one above is the use of the
SELECT accounts.id, accounts.name, last_purchase.* FROM accounts INNER JOIN LATERAL (SELECT * FROM purchases WHERE account_id = accounts.id ORDER BY created_at DESC LIMIT 1 ) AS last_purchase ON true;
The query above returns all records for
accounts, then finds the latest
purchases for each
account.id with a limit set to one. The
LATERAL keyword allows us to reference the
accounts table in the subquery and filter the
purchases table based on values from the
LATERAL keyword can be combined with any join type or can be used as an implicit join:
SELECT accounts.id, last_purchase.* FROM accounts, LATERAL (SELECT * FROM purchases WHERE account_id = accounts.id ORDER BY created_at DESC LIMIT 1 ) AS last_purchase;
For this example
LATERAL works fine because it’s a relatively small data set. For larger datasets, and this example
GROUP BY is a better use. Just remember that when using
LATERAL, it behaves as a recursive loop. For this example, it is evaluated for as many records as exist in accounts. For that reason, use the following
GROUP BY example if you seek performance or scale.
LATERAL this same query was solved using GROUP BY, like the following query. This query uses
GROUP BY within a CTE to find the maximum
purchases.created_at for each account
account_id, then we join the accounts and purchases based on their respective values.
WITH latest_purchase_per_account AS ( SELECT account_id, MAX(purchases.created_at) AS created_at FROM purchases GROUP BY 1 ) SELECT accounts.id, purchases.* FROM latest_purchase_per_account INNER JOIN accounts ON latest_purchase_per_account.account_id = accounts.id INNER JOIN purchases ON latest_purchase_per_account.created_at = purchases.created_at AND latest_purchase_per_account.account_id = purchases.account_id;
With a large number of rows, I anticipate the
GROUP BY to perform much faster than a
LATERAL. But, each scenario can be slightly different so it’s impossible to generalize. It’s important to know two patterns for solving a problem.
Like most things SQL,
LATERAL solves a simple problem, yet it can be used to solve complex problems. You’ll find it commonly used in GIS functions and JSON. Below, we will find matching sub-elements from a JSON structure using
LATERAL, with a conditional to return all of those in California:
SELECT accounts.id, accounts.name, address_elements.value->>'state' AS state, address_elements.value->>'city' AS city FROM accounts, LATERAL jsonb_array_elements(accounts.addresses) AS address_elements WHERE address_elements.value->>'state' = 'California';
In this example, we're utilizing
LATERAL with the
jsonb_array_elements function to unpack a JSON array from a column in the
accounts table. We then filter the results based on a specific type, allowing us to target certain elements within the JSON structure.
Nested element expansion is the most common usage of
LATERAL. This is because nested elements usually exist as a limited set of values, and because
LATERAL works so well with it.
In the dataset, someone decided to store all the tags for a purchase as a comma separated list. Suppose you want to find all purchases that have a specific tag. You can achieve this by using the
unnest function along with the
string_to_array function to split the
tags field into an array and then unnest those values into their own rows.
Here's an example query to find all purchases with the tag 'electronics':
SELECT accounts.id AS account_id, accounts.name AS account_name, purchases.name AS product_name, unnested_tags.tag FROM accounts INNER JOIN purchases ON accounts.id = purchases.account_id JOIN LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag) ON true WHERE unnested_tags.tag = 'electronics';
This query performs the following operations:
REGEXP_SPLIT_TO_ARRAYfunction splits the
tagscolumn of the
purchasestable into an array using a comma as the delimiter.
UNNESTfunction transforms the array of tags into separate rows
This approach provides a way to work with comma-separated values in a database and allows for complex queries on individual values that are part of a delimited string.
A similar strategy can be used for counting the purchases per tag:
SELECT unnested_tags.tag, COUNT(*) AS purchases_per_tag FROM purchases, LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag) GROUP BY unnested_tags.tag;
LATERAL joins in Postgres provide a powerful and flexible way to perform complex queries, particularly when referencing a table in a subquery. Whether you're handling hierarchical data, working with JSON, or dealing with any other scenarios where you need to reference a table in a subquery, the
LATERAL keyword can be an indispensable tool in your SQL toolkit. Explore its usage in your queries, and you'll find it can simplify and optimize your SQL code.