Time for the annual Advent of Code challenge for 2022! If you are not familiar with AOC, every December a new puzzle is added each day, getting harder as it goes along, and the challenge is to solve it any way you can by transforming an input file into an answer. I'm going to use PostgreSQL and solve things solely with the use of SQL.
This article will contain spoilers both on how I solved Day 1's challenge "Calorie Counting", as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.
AOC Day 1
I am going to attempt to solve these challenges using SQL. Specifically, using Postgres and all its wonderful tricks to get as far as I can. Let's jump right into Day 1. The first challenge is to find some information about a text file that looks like this:
4428 3773 1076 8063 10386 5705 8397 1084 7661
We need to find the group of items (an empty line starts a new group) with the highest number. The bits of Postgres tech used to solve this puzzle are:
- file_fdw, a built-in foreign data wrapper for Postgres
- The CASE command
- A sequence to help us separate the rows into groups
- The aggregate function sum inside a window to allow advanced counting
Hand's on Tutorial
We've also loaded a tutorial for Day's 1 challenge if you want to try it with a pre-loaded data set.
Reading the input
The first step is to download the input and save it to a local file. In this
case, it is 2,250 lines long and stored as
While we could use the COPY command to slurp it in, this is an excellent case for the file_fdw extension, which allows you to read in external files as if they were Postgres tables! This is done with the amazing FDW (foreign data wrapper) system. So, from inside the psql prompt, we need to install the extension, create a foreign server, and then create a foreign table that points to our text file:
CREATE EXTENSION file_fdw; CREATE SERVER aoc2022 foreign data wrapper file_fdw; CREATE FOREIGN TABLE aoc_day1 (calorie int) SERVER aoc2022 options(filename '/tmp/aoc2022.day1.input', null '');
Note that we also needed to explicitly tell Postgres that an empty line should become a null. Without that option, the table would throw an error when we tried to read it. Now that we have a table with one row per line, how can we group it? Grouping similar rows together is a job for window functions. In this case, we need to somehow put all rows together until we hit a null, then we need to start a new group. Let's create a pseudo-column and use a sequence to increment it only when we find a null value in our "calorie" column:
CREATE SEQUENCE aoc; -- Call it once so that currval works. The starting number can be anything. SELECT setval('aoc', 1); SELECT calorie, CASE WHEN calorie is null THEN nextval('aoc') ELSE currval('aoc') END FROM aoc_day1; calorie | currval ---------+--------- 4428 | 1 3773 | 1 1076 | 1 ☃ | 2 8063 | 2 10386 | 2 ☃ | 3 5705 | 3 8397 | 3 1084 | 3 7661 | 3
Now we can group those together, and find out the sum of each one by using the
SUM OVER() syntax. To keep it simple, we will use an ugly subselect:
SELECT SUM(calorie) OVER(partition by currval) FROM (SELECT calorie, CASE WHEN calorie is null THEN nextval('aoc') ELSE currval('aoc') END FROM aoc_day1 ) x; sum ------- 9277 9277 9277 18449 18449 18449 22847 22847 22847 22847 22847
Note that we don't care which row it came from, nor do we care about the
duplicate entries. All we care about is the maximum value in this new table.
Rather than a subselect, let's use
a CTE to make
things look better. Since one of the goals is to do this in as few SQL
statements as possible, let's put the
setval() into the top of the CTE:
WITH setup AS (SELECT setval('aoc',1)), x AS (SELECT calorie, CASE WHEN calorie is null THEN nextval('aoc') ELSE currval('aoc') END FROM setup, aoc_day1), y AS (SELECT sum(calorie) OVER(partition by currval) from x) SELECT max(sum) FROM y; max ------- 22847
Voila! This returns the highest combined number for all the groups. The actual puzzle had 2,250 lines of input, but that was no problem for Postgres. Onwards to Day 2 of the Advent of Code challenge.
Greg Sabino Mullane
December 12, 2022 •More by this author