Tutorial Instructions
Have you ever needed to reference a table in a subquery? With something like the following, where we try to reference the accounts table in the subquery:
SELECT
accounts.id,
accounts.name,
last_purchase.*
FROM
accounts
INNER JOIN (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase ON true;But, you had the following error:
ERROR: invalid reference to FROM-clause entry for table "accounts"
LINE 9: WHERE account_id = accounts.id
^
HINT: There is an entry for table "accounts", but it cannot be referenced from this part of the query.When you hear a reference to LATERAL, it is typically phrased as a LATERAL JOIN, but LATERAL is just a keyword that allows us to reference tables from the top-level query in a sub-query. The following is an example of INNER JOIN using the LATERAL keyword to find the latest purchase for all accounts (the only difference in this query and the one above is the use of the LATERAL keyword):
SELECT
accounts.id,
accounts.name,
last_purchase.*
FROM
accounts
INNER JOIN LATERAL (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase ON true;The query above returns all records for accounts, then finds the latest purchases for each account.id with a limit set to one. The LATERAL keyword allows us to reference the accounts table in the subquery and filter the purchases table based on values from the accounts table.
The LATERAL keyword can be combined with any join type or can be used as an implicit join:
SELECT
accounts.id,
last_purchase.*
FROM
accounts, LATERAL (SELECT
*
FROM purchases
WHERE account_id = accounts.id
ORDER BY created_at DESC
LIMIT 1
) AS last_purchase;For this example LATERAL works fine because it’s a relatively small data set. For larger datasets, and this example GROUP BY is a better use. Just remember that when using LATERAL, it behaves as a recursive loop. For this example, it is evaluated for as many records as exist in accounts. For that reason, use the following GROUP BY example if you seek performance or scale.
Prior to LATERAL this same query was solved using GROUP BY, like the following query. This query uses GROUP BY within a CTE to find the maximum purchases.created_at for each account account_id, then we join the accounts and purchases based on their respective values.
WITH latest_purchase_per_account AS (
SELECT
account_id,
MAX(purchases.created_at) AS created_at
FROM purchases
GROUP BY 1
)
SELECT
accounts.id,
purchases.*
FROM latest_purchase_per_account
INNER JOIN accounts ON latest_purchase_per_account.account_id = accounts.id
INNER JOIN purchases ON latest_purchase_per_account.created_at = purchases.created_at
AND latest_purchase_per_account.account_id = purchases.account_id;With a large number of rows, I anticipate the GROUP BY to perform much faster than a LATERAL. But, each scenario can be slightly different so it’s impossible to generalize. It’s important to know two patterns for solving a problem.
Like most things SQL, LATERAL solves a simple problem, yet it can be used to solve complex problems. You’ll find it commonly used in GIS functions and JSON. Below, we will find matching sub-elements from a JSON structure using LATERAL, with a conditional to return all of those in California:
SELECT
accounts.id,
accounts.name,
address_elements.value->>'state' AS state,
address_elements.value->>'city' AS city
FROM
accounts,
LATERAL jsonb_array_elements(accounts.addresses) AS address_elements
WHERE
address_elements.value->>'state' = 'California';In this example, we're utilizing LATERAL with the jsonb_array_elements function to unpack a JSON array from a column in the accounts table. We then filter the results based on a specific type, allowing us to target certain elements within the JSON structure.
Nested element expansion is the most common usage of LATERAL. This is because nested elements usually exist as a limited set of values, and because LATERAL works so well with it.
In the dataset, someone decided to store all the tags for a purchase as a comma separated list. Suppose you want to find all purchases that have a specific tag. You can achieve this by using the unnest function along with the string_to_array function to split the tags field into an array and then unnest those values into their own rows.
Here's an example query to find all purchases with the tag 'electronics':
SELECT
accounts.id AS account_id,
accounts.name AS account_name,
purchases.name AS product_name,
unnested_tags.tag
FROM
accounts
INNER JOIN purchases ON accounts.id = purchases.account_id
JOIN LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag) ON true
WHERE
unnested_tags.tag = 'electronics';This query performs the following operations:
REGEXP_SPLIT_TO_ARRAY function splits the tags column of the purchases table into an array using a comma as the delimiter.UNNEST function transforms the array of tags into separate rowsThis approach provides a way to work with comma-separated values in a database and allows for complex queries on individual values that are part of a delimited string.
A similar strategy can be used for counting the purchases per tag:
SELECT
unnested_tags.tag,
COUNT(*) AS purchases_per_tag
FROM
purchases,
LATERAL unnest(REGEXP_SPLIT_TO_ARRAY(purchases.tags, E',')) AS unnested_tags(tag)
GROUP BY
unnested_tags.tag;LATERAL joins in Postgres provide a powerful and flexible way to perform complex queries, particularly when referencing a table in a subquery. Whether you're handling hierarchical data, working with JSON, or dealing with any other scenarios where you need to reference a table in a subquery, the LATERAL keyword can be an indispensable tool in your SQL toolkit. Explore its usage in your queries, and you'll find it can simplify and optimize your SQL code.
Loading terminal...
Loading terminal...