Temporal Joins
5 min readMore by this author
My first thought seeing a temporal join in 2008 was, “Why is this query so complex?” The company I was at relied heavily on database queries, as it was a CRM and student success tracking system for colleges and universities. The query returned a filtered list of users and their last associated record from a second table. The hard part about the query isn’t returning the last timestamp or even performing joins, it’s returning only their last associated record from a second table.
Back in 2008, we didn’t have window functions or CTEs, so the query algorithm was a series of nested tables that looked like this:
SELECT
*
FROM users, ( -- find the record for the last second_table by created_at and user_id
SELECT
second_table.*
FROM second_table, ( -- find the last second_table created_at per user_id
SELECT
user_id,
max(created_at) AS created_at
FROM second_table
GROUP BY 1
) AS last_second_table_at
WHERE
last_second_table_at.user_id = second_table.user_id
AND second_table.created_at = last_second_table_at.created_at
) AS last_second_table
WHERE users.id = last_second_table.user_id;
See the Sample Code section below for the schema and data to run these queries.
But, even that query was wrong because the second table may have records with duplicate created_at values. That was the source of a bug back in 2008 that resulted in duplicate rows being listed.
Obviously, we weren't using Postgres at the time because there has always been a simpler way to do this in Postgres using DISTINCT ON:
SELECT DISTINCT ON (u.id)
u.id,
u.name,
s.created_at AS last_action_time,
s.action_type
FROM users u
JOIN second_table s ON u.id = s.user_id
ORDER BY u.id, s.created_at DESC, s.id DESC;
Temporal joins require attention to detail.
Robust Solution: CTEs & Window Functions
Before we go too far into the topic, for those looking for a solution to their current problem, below is how I would write that query today if you aren't finding the first or last in a series. For these situations, we use CTEs and window functions, so there's no need to nest queries when we can abstract them for a cleaner purpose. Here is the template for the temporal joins that do not work with DISTINCT ON:
WITH max_second_table AS (
SELECT
*
FROM (
SELECT
*,
-- Use ROW_NUMBER() window function to return the latest record:
-- The ORDER BY clause is critical:
-- 1. ORDER BY created_at DESC finds the latest time.
-- 2. ORDER BY id DESC serves as a reliable tie-breaker
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC, id DESC) AS row_order
FROM second_table
) AS ordered_second_table
WHERE row_order = 2
)
SELECT
*
FROM users
LEFT JOIN max_second_table ON users.id = max_second_table.user_id;
In this example, we are joining the second occurrence (WHERE row_order = 2) in the second_table for a user. For the university example, we used these types of queries to report on progress over time by showing the 1st, 2nd, 3rd, and
Is this actually less code than the first example? No, but it is compartmentalized with a cleaner purpose.
Also, introducing the primary key (id) in the ORDER BY clause provides the necessary tie-breaker logic for the sorting -- that is how we fixed the SQL issue in the opening text.
Problem with ORMs
Due to their query complexity, ORMs are generally not capable of handling temporal joins without complex manipulation. The ORM I'm most familiar with is ActiveRecord, part of the Ruby on Rails suite. When Rails developers encounter temporal joins, they typically resort to the N+1 query pattern from their application code like this:
users = User.all
users.each do |user|
last_action = user.second_table.last
end
If you aren't running this query too frequently or over too many user records, this is generally performant enough. However, this approach becomes suboptimal for application performance as the user list grows because each iteration of the loop requires a network hop back and forth with the database and an object initialization in the application. While you can make ActiveRecord do this natively, the resulting code is often harder to read and maintain for the typical use case—a pattern you see in other ORMs as well.
Sample Code
Below is the sample SQL you can use to load data into your database to test a few of these queries. Note that Alice has two actions at the exact same timestamp to replicate the original bug scenario.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT
);
CREATE TABLE second_table (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
action_type TEXT,
created_at TIMESTAMP WITHOUT TIME ZONE
);
INSERT INTO users (name) VALUES ('Alice'), ('Bob'), ('Charlie');
-- Alice has two actions at the exact same timestamp (The 2008 bug scenario)
INSERT INTO second_table (user_id, action_type, created_at) VALUES
(1, 'login', '2023-10-01 10:00:00'),
(1, 'page_view', '2023-10-01 10:00:00'),
(2, 'purchase', '2023-10-02 11:00:00'),
(3, 'registration', '2023-10-03 12:00:00'),
(3, 'profile_update', '2023-10-04 13:00:00');
Conclusion
The term "temporal join" isn't a common piece of developer jargon, but the underlying pattern, retrieving the
Using the PostgreSQL feature DISTINCT ON for the simplest case, or CTEs with Window Functions for complex retrieval, we avoid the bugs of older SQL patterns and eliminate the performance penalty of the N+1 problem.
If you would like to learn more about advanced SQL patterns, check out our Postgres Playground.