4 Ways to Create Date Bins in Postgres: interval, date_trunc, extract, and to_char
You followed all the best practices, your sales dates are stored in perfect timestamp format …. but now you need to get reports by day, week, quarters, and months. You need to bin, bucket, and roll up sales data in easy to view reports. Do you need a BI tool? Not yet actually. Your Postgres database has hundreds of functions that let you query data analytics by date. By using some good old fashioned SQL - you have powerful analysis and business intelligence with date details on any data set.
In this post, I’ll walk through some of the key functions querying data by date.
For a summary of the best ways to store date and time in Postgres, see Working with Time in Postgres. We also have interactive web based tutorial with lots of sample code for working with data by date, with sample data set of ecommerce orders.
Interval - the Swiss-army knife of date manipulation
The interval
is a data type used to modify other times. For instance, an interval can be added or subtracted from a known time. Interval is super handy and the first place you can go to quickly summarize data by date. Like a Swiss-army knife, it’s not always the best tool for the job, but it can be used in a pinch. Let’s talk about where it excels.
How can we run a query that returns the total sum of orders for the last 90 days? Of course, interval can be used. Without interval, we often see people using a date variable passed from an external source that has generated a date. Using now() - INTERVAL '90 days'
, you can use the same query no matter the date. The other secret sauce is the use of now()
which is a timestamp for the current time on the server.
SELECT
SUM(total_amount)
FROM
orders
WHERE
order_date >= NOW () - INTERVAL '90 days';
sum
-----------
259472.99
(1 row)
Instead of using now()
, current_date
can be used to return a date instead of a time.
SELECT
SUM(total_amount)
FROM
orders
WHERE
order_date >= current_date - INTERVAL '90 days';
These two queries are different — current_date
starts at the beginning of the day, and now()
will include a time throughout the day. When using now()
the results will match only those that occurred after the current time 90 days ago.
Commonly, people use a shorter form for intervals using cast, but it’s the same query:
SELECT
SUM(total_amount)
FROM
orders
WHERE
order_date >= NOW() - '90 days'::interval;
Using interval for binning
To create interval ranges, we can combine the use of CASE
with interval
. SQL’s CASE
performs conditional logic within queries. The format for CASE
is WHEN .. THEN
, below is a query that executes a sample case statement:
SELECT
CASE
WHEN false THEN 'not this'
WHEN true THEN 'this will show'
ELSE 'never makes it here'
END;
Now, let’s categorize orders into the time ranges: "30-60 days ago", "60-90 days ago"
SELECT
CASE
WHEN order_date BETWEEN (NOW() - INTERVAL '60 days') AND (NOW() - INTERVAL '30 days')
THEN '30-60 days ago'
WHEN order_date BETWEEN (NOW() - INTERVAL '90 days') AND (NOW() - INTERVAL '60 days')
THEN '60-90 days ago'
END AS date_range,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_sales
FROM
orders
WHERE
order_date BETWEEN (NOW() - INTERVAL '90 days') AND (NOW() - INTERVAL '30 days')
GROUP BY
date_range
ORDER BY
date_range;
date_range | total_orders | total_sales
----------------+--------------+-------------
30-60 days ago | 160 | 101754.20
60-90 days ago | 128 | 88086.24
This may look a bit complicated, but the conditional for the statement is order_date BETWEEN begining_date_value AND ending_date_value
. Since CASE
statements end after the first truthy conditional, we can simplify this a bit more:
SELECT
CASE
WHEN order_date >= NOW() - '30 days'::interval THEN '00-30 days ago'
WHEN order_date >= NOW() - '60 days'::interval THEN '30-60 days ago'
ELSE
'60-90 days ago'
END AS date_range,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_sales
FROM
orders
WHERE
order_date >= NOW() - '90 days'::interval
GROUP BY
date_range
ORDER BY
date_range;
It’s best to choose a pattern depending on how explicit you want to be with your SQL queries. Using BETWEEN
is more explicit, and may be best for teams choosing more explicit queries. The hard part about using INTERVAL
is that recent time is greater than older time — so the >=
may break the brains of those who haven’t used a lot of time manipulation.
In summary: use interval
for binning continuous time.
date_trunc - the easiest function for date binning
Use date_trunc
for binning of pre-defined time: like day, week, month, quarter, and year. Where interval logic can be complicated, date_trunc is dead simple.
At a glance, date_trunc’s name might indicate that its about formatting, but it is more powerful when combined with GROUP BY
. date_trunc is an essential part of the query toolkit when working with analytics. Simple uses of date_trunc is like the following:
/* show the beginning of the first day of the month */
SELECT date_trunc('month', current_date);
/* show the beginning of the first day of the week */
SELECT date_trunc('week', current_date);
/* show the beginning of the first day of the year */
SELECT date_trunc('year', current_date);
/* show the beginning of the first day of the current quarter */
SELECT date_trunc('quarter', current_date);
To generate a date bin, extract the period of time from the record’s date. For instance, let’s write a query to show the monthly number of orders and total order sales:
SELECT
date_trunc ('month', order_date) AS month,
COUNT(*) AS total_orders,
SUM(total_amount) AS monthly_total
FROM
orders
GROUP BY 1
ORDER BY
month;
Results would look like:
month | total_orders | monthly_total
---------------------+--------------+---------------
2024-08-01 00:00:00 | 11 | 2699.82
2024-09-01 00:00:00 | 39 | 8439.41
(2 rows)
Using GROUP BY
, Postgres counts and summates based on the unique values returned by the the date_trunc
function. The available bins for date_trunc
are: millennium, century, decade, year, quarter, week, day, hour, minute, second, millisecond.
Extract - sometimes you have to do something funky
Not all dates are nicely broken into days, months, years, etc. The extract
function extracts a specific value for a date / time type. For instance, I commonly use extract for the following:
/* returns the epoch value for a date / time */
/* I this use to send date values to Javascript */
SELECT extract('epoch' from current_date);
/* returns the hour from a time type */
SELECT extract('hour' from now());
How can this be used to bin values? For example, if you wanted to find which hours of which day of the week has the highest number and sales value of orders:
SELECT
extract('dow' from order_date) AS day_of_week,
extract('hour' from order_date) AS hour,
COUNT(*) AS total_orders,
SUM(total_amount) AS monthly_total
FROM
orders
GROUP BY 1, 2
ORDER BY 1, 2;
day_of_week | hour | total_orders | monthly_total
-------------+------+--------------+---------------
0 | 23 | 35 | 23631.56
1 | 0 | 31 | 19299.88
You'll see here Sunday is '0` and Saturday is '6'.
Where date_trunc
keeps the higher context, extract
removes all context except that which is requested.
to_char - extreme makeover date edition
It’s awkward because to_char
is both the most versatile and most hated function for date binning. The function will accept time / date, text, or numbers for additional formatting, so it’s not explicitly for date functions. It’s never failed, when I’ve used to_char, someone has told me that I could have used a better function. It can produce human readable values quickly, but it’s unsuited for data sent for additional machine processing.
Here are a few examples of to_char
:
/* extract current day of week and current hour of day based on UTC */
SELECT to_char(now(), 'DayHH24');
/* extract current day of week and current hour of day based on NYC time zone */
SELECT to_char(now() AT TIME ZONE 'America/New_York' , 'DayHH24');
This outputs the current day of the week, and current hour based on UTC time. This breaks your brain right? What does the “DayHH24” portion mean? Postgres documentation has a long list for reserved strings used by to_char:
To change the presentation of a month, using to_char to extract and format the name and year:
SELECT to_char(order_date, 'FMMonth YYYY') AS formatted_month,
COUNT(*) AS total_orders,
SUM(total_amount) AS monthly_total
FROM
orders
GROUP BY 1
ORDER BY 1;
formatted_month | total_orders | monthly_total
-----------------+--------------+---------------
August 2024 | 11 | 2699.82
September 2024 | 39 | 8439.41
Escaping reserved strings in to_char
:
The common format for quarters in finance is “Q1” / “Q2” / “Q3” and “Q4”. Using to_char
, we can extract the quarter for a time in that format. But, the “Q” is a reserved keyword for quarter. To print a “Q” without evaluating it, wrap it in double quotes:
SELECT
to_char(order_date, '"Q"Q-YYYY') AS formatted_quarter,
SUM(total_amount) AS total_amount
FROM
orders
GROUP BY 1
ORDER BY 1;
formatted_quarter | total_amount
-------------------+--------------
Q1-2022 | 313872.84
Q1-2023 | 282774.15
Q1-2024 | 287379.33
Summary
Binning is an essential tool for faceting the data for financial reports and data analysis. Dates and times are a more complex piece of information than they first appear — hours, months, hours, quarters, years. So, a single date can be facetted many ways.
Luckily, Postgres has the functions you need to work with dates. For a quick summary:
interval - modifies date / times by adding / subtracting
date_trunc - truncates a date / time — essentially rounding-down to the closest value
extract - extracts a single piece of information from a date / time (day, week, month, quarter, year)
to_char - formats output into a specific style of date format or text string.
Related Articles
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read