How to Solve Advent of Code 2022 Using Postgres - Day 4
Spoiler Alert!
This article will contain spoilers both on how I solved 2022 Day 4's challenge "Camp Cleanup" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
Hands on Tutorial
We've also loaded a tutorial for Day 4's challenge if you want to try it with a pre-loaded data set.
AOC Day 4
This challenge was actually simpler than the previous day's ones, as we were able to lean heavily on some Postgres features such as:
- file_fdw, a built-in foreign data wrapper for Postgres
- The function
split_part()
- The
int4range
data type - The range operators
@>
<@
and&&
For Day 4, we are given a text file containing two ranges of numbers on each line, like so:
71-89,66-70
24-70,23-55
19-85,18-86
50-90,50-95
55-55,56-72
3-65,5-66
98-99,66-99
14-67,14-14
4-79,78-79
13-98,10-98
Part one of the puzzle asks us to see which of the lines has one range of numbers completely inside another one. In other words, either all the numbers in the left range are contained by the numbers in the right range, or vice-versa.
This one was very easy, for Postgres contains support for doing operations such as seeing if one group of things is inside another. As these are ranges of integers, it seemed best to turn these into int4range
columns, in which we only provide the start and stop values. First, the usual setup:
CREATE EXTENSION if not exists file_fdw;
CREATE SERVER if not exists aoc2022 foreign data wrapper file_fdw;
DROP FOREIGN TABLE if exists aoc_day4;
CREATE FOREIGN TABLE aoc_day4 (a text, b text)
SERVER aoc2022 options(filename '/tmp/aoc2022.day4.input', delimiter ',');
SELECT * FROM aoc_day4 LIMIT 10;
a | b
-------+-------
71-89 | 66-70
24-70 | 23-55
19-85 | 18-86
50-90 | 50-95
55-55 | 56-72
3-65 | 5-66
98-99 | 66-99
14-67 | 14-14
4-79 | 78-79
13-98 | 10-98
Next we need to change each of those rows into an actual range of integers. We will use the int4range()
function to do this, which expects a bottom range, and a top range. Because the puzzle also contains items such as 6-6
, we also need to tell the int4range() function that we want our edges to be inclusive, which we can do by giving a third argument of []
:
SELECT a, b,
int4range(split_part(a,'-',1)::int, split_part(a,'-',2)::int, '[]') AS aa
,int4range(split_part(b,'-',1)::int, split_part(b,'-',2)::int, '[]') AS bb
FROM aoc_day4 LIMIT 10;
a | b | aa | bb
-------+-------+----------+----------
71-89 | 66-70 | [71,90) | [66,71)
24-70 | 23-55 | [24,71) | [23,56)
19-85 | 18-86 | [19,86) | [18,87)
50-90 | 50-95 | [50,91) | [50,96)
55-55 | 56-72 | [55,56) | [56,73)
3-65 | 5-66 | [3,66) | [5,67)
98-99 | 66-99 | [98,100) | [66,100)
14-67 | 14-14 | [14,68) | [14,15)
4-79 | 78-79 | [4,80) | [78,80)
13-98 | 10-98 | [13,99) | [10,99)
The output of those last two columns looks a little weird, but it does represent exactly the range we want: inclusive of the "lower bound" number as represented by a left bracket [
, and exclusive of the "upper bound" number, as shown by a right parenthesis )
.
Postgres has a lot of support for doing things with ranges, including precisely what the puzzle is asking for, which is determining if one range fits completely into another one. For that, we will use the built-in @>
and <@
range functions, which test if a left range contains the second, and if the left range is contained by the second. Both of these return true or false, so we get this:
WITH q AS (
SELECT a, b,
int4range(split_part(a,'-',1)::int, split_part(a,'-',2)::int, '[]') AS aa
,int4range(split_part(b,'-',1)::int, split_part(b,'-',2)::int, '[]') AS bb
FROM aoc_day4
)
SELECT *, aa @> bb AS l, aa <@ bb AS m FROM q LIMIT 10;
a | b | aa | bb | l | m
-------+-------+----------+----------+---+---
71-89 | 66-70 | [71,90) | [66,71) | f | f
24-70 | 23-55 | [24,71) | [23,56) | f | f
19-85 | 18-86 | [19,86) | [18,87) | f | t
50-90 | 50-95 | [50,91) | [50,96) | f | t
55-55 | 56-72 | [55,56) | [56,73) | f | f
3-65 | 5-66 | [3,66) | [5,67) | f | f
98-99 | 66-99 | [98,100) | [66,100) | f | t
14-67 | 14-14 | [14,68) | [14,15) | t | f
4-79 | 78-79 | [4,80) | [78,80) | t | f
13-98 | 10-98 | [13,99) | [10,99) | f | t
Now it's just a matter of counting up how many rows satisfy either condition:
WITH q AS (
SELECT a, b,
int4range(split_part(a,'-',1)::int, split_part(a,'-',2)::int, '[]') AS aa
,int4range(split_part(b,'-',1)::int, split_part(b,'-',2)::int, '[]') AS bb
FROM aoc_day4
)
,r AS (SELECT *, aa @> bb AS l, aa <@ bb AS m FROM q)
SELECT count(*) FROM r WHERE l OR m;
count
-------
6
(Note: your sums might be different if you used a unique file)
Done with part one! For part two, we need to get a count, but now we need to know how many of the two sections overlap at all. This was a three-second solve, as Postgres also has an overlaps operator, so we simply replace our containment operators with the overlaps operator &&
:
WITH q AS (
SELECT a, b,
int4range(split_part(a,'-',1)::int, split_part(a,'-',2)::int, '[]') AS aa
,int4range(split_part(b,'-',1)::int, split_part(b,'-',2)::int, '[]') AS bb
FROM aoc_day4
)
,r AS (SELECT *, aa && bb AS yes_overlap FROM q)
SELECT count(*) FROM r WHERE yes_overlap;
count
-------
8
That's the end! Onwards to Day 5...
Related Articles
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read