Crunchy Data joins Snowflake. Read the announcement

Tutorial Instructions

Working with Time in Postgres

Since humans first started recording data, they’ve been keeping track of time. Time management is one of those absolutely crucial database tasks and Postgres does a great job of it. Postgres has a lot of options for storing and querying time so I wanted to provide an overview of some of the most common needs for storing and retrieving time data.

If you ask Postgres what time it is,


SELECT now();

You’ll get

now             
-----------------------------
 2023-04-28 18:23:58.5603+00

The default time representation here is a full timestamp string, containing the date, time, and a reference to timezone. In this case, the +00 represents equal with UTC. UTC has long been a standard time measurement following suit from the Greenwich mean time (if you’re as old as I am).

If I want to know the time in my local timezone


SELECT now() AT TIME ZONE 'America/Chicago';

The full list of timezones names you can use is stored in a system table and can be retrieved with select * from pg_timezone_names;

Data types for time

Postgres has a TIME data type, with and without a time zone if you want to store that separately from a date. This is generally not recommended since in most cases time requires an accompanying date. There’s a TIMESTAMP datatype. Adding timezone to TIMESTAMP is TIMESTAMP WITH TIMEZONE or aliased as the TIMESTAMPTZ. Without a doubt TIMESTAMPTZ is going to be the MVP of Postgres time storage. If you store data in with the full date, time, and timezone you’ll never have to worry about the server time, what time the user entered the data, what time it is where you’re querying data, or any of those crazy calculations. And you or your application can pull out the time and display it in whatever local user timezone you need.

When working with Postgres, you’ll also see epoch which is how seconds are represented. This is not a timestamp, its an integer (a double precision floating-point number, 64 bits) and it represents the number of seconds since July 1st, 1970. This can be used if you need a specific comparison or need time in that format. Postgres can easily convert back at forth between timestamps and epochs. To find the current epoch:


SELECT EXTRACT (EPOCH FROM now());

Time formats & functions

I’m an American midwesterner so of course, I would write Bastille Day like - July 14th, 1789 or 7-14-1789. Of course all my French friends would write it 14 July 1789 or 14-07-1789. And while I’d love to debate with you all over beers about the best way to do this, ISO has some standards for time formats, namely ISO 8601 which states that dates will be read like this 1789-07-14 17:30:00.000, year-month-day-time. This date format is what used in TIMESTAMP and what you’ll see most often in the database and engineering world.

Time storage has the ISO8601 best practice, however, depending on your end users or business needs, you may want to change the time format in your queries whey they’re output. So to change the time format of a query you can use the TO_CHAR function which will translate a time string into different characters.


SELECT TO_CHAR(NOW(), 'DY, Mon dd, yyyy HH24:MI:SS OF');

TO_CHAR let’s you convert the time interval string to text and characters. Then using some formatting functions, I can pull out the day of the week, an American date format, and UTC time. The result of that query would be:

Time intervals

Now that we’re fancy and can get dates in any format we want, how about calculating intervals and lapsed time in different formats?

We’ve loaded in a sample table with some train schedule data, take a peek


SELECT * FROM train_schedule LIMIT 3;

and it looks like this

trip_id | track_number | train_number |  scheduled_departure   |   scheduled_arrival    |    actual_departure    |     actual_arrival     
---------+--------------+--------------+------------------------+------------------------+------------------------+------------------------
       1 |            1 |          683 | 2023-04-29 11:15:00+00 | 2023-04-29 12:35:00+00 | 2023-04-29 11:21:00+00 | 2023-04-29 12:52:00+00
       2 |            1 |          953 | 2023-04-29 13:49:00+00 | 2023-04-29 15:10:00+00 | 2023-04-29 13:50:00+00 | 2023-04-29 15:17:00+00
       3 |            1 |          140 | 2023-04-29 15:06:00+00 | 2023-04-29 15:23:00+00 | 2023-04-29 15:06:00+00 | 2023-04-29 15:22:00+00
(3 rows)

Let’s say you are storing an update_time fields. To find your the lower and upper bounds of arrival times times in your data set you would do


SELECT min(actual_arrival) FROM train_schedule;

and


 SELECT max(actual_arrival) FROM train_schedule;

To find the interval between them:


SELECT 
(SELECT max(actual_arrival) FROM train_schedule) 
- (SELECT min(actual_arrival) 
FROM train_schedule);

Ok, so we have about 10 days of train schedule information in here.

Taking this a step further, if I want to look at intervals between scheduled time of departure and actual time of departure. I can create an arrival_delta and a subquery that compares actual arrival minus scheduled arrival.


SELECT avg(arrival_delta) 
FROM (SELECT scheduled_arrival, actual_arrival, 
	actual_arrival - scheduled_arrival AS arrival_delta 
FROM train_schedule)q;

You can also add a filter to find interval sizes. If we build on the above query but only for departures that were more than 10 minutes later than their original scheduled time we can add this interval > ‘10 minutes`.


SELECT avg(arrival_delta) 
FROM (select scheduled_arrival, actual_arrival, 
actual_arrival - scheduled_arrival AS arrival_delta 
FROM train_schedule WHERE (actual_arrival - scheduled_arrival) 
> INTERVAL '10 minutes')q;

Overlapping / intersecting time

What if I wanted to find all of the trains that were running at a specific time - or now. You can use the OVERLAP operator with the INTERVAL.


SELECT count(*) FROM train_schedule 
WHERE (actual_departure, actual_arrival) 
OVERLAPS (now(), now() - INTERVAL '2 hours');

Time Range Types

Postgres also supports working with time ranges that include both a single range, and even multiple ranges. Single ranges of the timestamptz is called tstzrange and one for multiple ranges would be tstzmultirange

For example, if we wanted to create a table in our train database that has some peak travel season fares, we could do:


CREATE TABLE fares
(peak_id int,
peak_name text,
peak_times tstzmultirange,
fare_change numeric);

INSERT INTO fares(peak_id, peak_name, peak_times, fare_change)
VALUES (1, 'holiday', '{[2023-12-24 00:00:, 2023-12-27 00:00],[2023-12-31 00:00, 2024-01-02 00:00]}', 50),
(1, 'peak_summer', '{[2023-05-27 00:00:, 2023-05-30 00:00],[2023-07-03 00:00, 2023-08-30 00:00]}', 30);

And now to query something with the mult-timezone range, Postgres has a special operator for this, @>. Let’s see if travel today is during peak time.


SELECT * from fares WHERE peak_times @> now();

Indexing time columns

Anytime you’re querying time a lot, you’ll want to add an index so that time lookups are faster. Timestamps column indexes work will with the traditional B-tree index as well as BRIN. In general, if you have tons of data entered sequentially a BRIN index is probably recommended.

A B-tree would be created like this:


 CREATE INDEX btree_actual_departure ON train_schedule (actual_departure);

And a BRIN


CREATE INDEX brin_sequential ON train_schedule USING BRIN (actual_departure);

Roll ups

So let’s say you have quite a bit of time data. Using the date_trunc function you can easily pull out timestamp data by day or date and then you can use a query to count by the date/date.

If I want to find in my train data a count of train trips per day, that would look like this:


SELECT
date_trunc('day', train_schedule.actual_departure) d,
COUNT (actual_departure)
FROM
train_schedule
GROUP BY
d
ORDER BY
d;

Roll ups won’t be the only way to deal with lots and lots of time data. Partitioning can be really helpful once you have lots of time data that can be easing sectioned off. If you’re getting into measuring analytics or metrics, there’s some options for that as well, like hyperloglog.

Summary

Thanks for spending your time learning about time ;) Some takeaways

store time in UTC +/- values
timestamptz is your bff
to_char and all of the formatting functions let you query time however you want
Postgres has lots of functions for interval and overlap so you can look at data that intersects
date_trunc can be really helpful if you want to roll up time fields and count by day or month

Loading terminal...