How to Solve Advent of Code 2022 Using Postgres - Day 3

Spoiler Alert!

This article will contain spoilers both on how I solved 2022 Day 3's challenge "Rucksack Reorganization" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.

Hands on Tutorial

We've also loaded a tutorial for Day's 3 challenge if you want to try it with a pre-loaded data set.

AOC Day 3

Today's challenge was solved using the following Postgres tech:

  • file_fdw, a built-in foreign data wrapper for Postgres
  • The functions length(), left(), right(), and regexp_replace()
  • The COLLATE command to allow us to carefully compare characters
  • The window function lag() to gather nearby rows
  • A sequence to allow picking certain rows

For day 3 of the AOC (Advent of Code), we are given a file in which each line represents items in a rucksack. There are two compartments, divided equally, and we need to find which item is in common for the sides, generate a score based on what character it is, then sum up the scores. As before, we are going to use file_fdw, so let's set things up and create a simple table with a single column for each row.

CREATE EXTENSION if not exists file_fdw;

CREATE SERVER if not exists aoc2022 foreign data wrapper file_fdw;

DROP FOREIGN TABLE if exists aoc_day3;

CREATE FOREIGN TABLE aoc_day3 (items text)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day3.input');

SELECT * FROM aoc_day3;
                      items
--------------------------------------------------
 lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
 zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
 rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
 FjpnFRDmbRtnbJrFJmSTsGShWVhGqGVVsmqs
 ZwPvNPdzNZwfzBNLdNNNNcLvhnQhqMTVsTGSWSqGqTdVWhMT
 vgLZHfvLffNLPbggnrbFpJnCbC
 hzJzGjGfqmGtDQtDSvVV
 plpcMBNBcCTlTgCMbvtrsSVsVJDJlrwDQr
 McHBMMcTTHgJnWqnRqjzZnnRzR
 ppvsGZhDGprrSjSllwfZ

So a random assortment of strings, with the limitation that all characters are [a-zA-z] and there are an even number of characters in each row. Our first step is to divide things into two sections. We will use the built-in SQL functions length(), left(), and right() to accomplish this:

SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3;
-[ RECORD 1 ]--------------------------------------------
items  | lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
first  | lflZfgnSnlmmlgGfjG
second | thQPtLNsQhvbHLLpSS
-[ RECORD 2 ]--------------------------------------------
items  | zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
first  | zrCVDVFMJTCTcCJMwCThW
second | btbpbWpPbtbHPLQssLsHP
-[ RECORD 3 ]--------------------------------------------
items  | rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
first  | rBFcrwFzFwwVDcDrzTzJ
second | fnRGjllBdGZnnZfhqmdn

I'm trying to not repeat any functions or actions when possible during this challenge, but I think we can let the double length() slide. Now we need to find letters that are in common to both the first and second sections. This can be done in many ways, but we'll simply replace any letters in the first section that also exists in the second with an empty character, using the regexp_replace() function:

WITH q AS (
  SELECT *, length(items), left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
  FROM aoc_day3
)
,r AS (SELECT first, second, regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
SELECT * FROM r;
          first           |          second          | common
--------------------------+--------------------------+--------
 lflZfgnSnlmmlgGfjG       | thQPtLNsQhvbHLLpSS       | S
 zrCVDVFMJTCTcCJMwCThW    | btbpbWpPbtbHPLQssLsHP    | W
 rBFcrwFzFwwVDcDrzTzJ     | fnRGjllBdGZnnZfhqmdn     | B
 FjpnFRDmbRtnbJrFJm       | STsGShWVhGqGVVsmqs       | mm
 ZwPvNPdzNZwfzBNLdNNNNcLv | hnQhqMTVsTGSWSqGqTdVWhMT | dd
 vgLZHfvLffNLP            | bggnrbFpJnCbC            | g
 hzJzGjGfqm               | GtDQtDSvVV               | GG
 plpcMBNBcCTlTgCMb        | vtrsSVsVJDJlrwDQr        | ll
 McHBMMcTTHgJn            | WqnRqjzZnnRzR            | n
 ppvsGZhDGp               | rrSjSllwfZ               | Z

We will use the left() function again to grab a single character, then do a simple mapping to get the score for that character. The puzzle states that a-z are worth 1-26 and A-Z are worth 27-52. Note that we need to use collation 'C', to ensure our "less than" check works in a case-insensitive manner. The ascii() function has the curious effect of only converting the first letter of any string passed to it, so we pass the duplicated letters directly to it. Finally, we add up all the score to get our final result.

WITH q AS (
  SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
  FROM aoc_day3
)
,r AS (SELECT regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
,s AS (
  SELECT ascii(common) -
    (case when left(common,1) < 'a' COLLATE "C" THEN 38 else 96 end) AS score
  FROM r)
SELECT sum(score) FROM s;
 sum
-----
 257

(Note: your sums might be different if you used a unique file)

This puzzle has a second part. Using the same input, we now need to group the lines into groups of three, and find out which character is in common across all three lines, then generate a score for that character. The first trick is grouping things by threes. We are going to approach this with a windowing function. First, we want to grab our current line, and the values from the two rows "before" it. I changed it to left(x,5) just to make the output fit a little better:

SELECT left(items,5) AS one,
 left( LAG(items,1) over(), 5) AS two,
 left( LAG(items,2) over(), 5) AS three
FROM aoc_day3;
  one  |  two  | three
-------+-------+-------
 lflZf | ☃     | ☃
 zrCVD | lflZf | ☃
 rBFcr | zrCVD | lflZf
 FjpnF | rBFcr | zrCVD
 ZwPvN | FjpnF | rBFcr
 vgLZH | ZwPvN | FjpnF
 hzJzG | vgLZH | ZwPvN
 plpcM | hzJzG | vgLZH
 McHBM | plpcM | hzJzG
 ppvsG | McHBM | plpcM

(Do you want to build a snowman? In the example above, the snowman appears as a symbol for a NULL entry, which I accomplished by adding this to my ~/.psqlrc file: \pset null ☃)

When we get to the third row, we have also captured the two rows before it. In other words, we have groups of three that we can manipulate. The next issue is to only pick every third row. As this is coming from a file via file_fdw, we can be sure that the order stays consistent. We will use a sequence to allow counting by threes and filter out the other two rows in every triad. To be on the safe side, we reset the sequence it case it already exists, to force Postgres to start counting from the number one:

CREATE SEQUENCE if not exists aoc;
SELECT setval('aoc', 1, false);

WITH p AS (SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three FROM aoc_day3)
SELECT left(one,5), left(two,5), left(three,5)
FROM p
WHERE 0 = nextval('aoc')%3;
 left  | left  | left
-------+-------+-------
 rBFcr | zrCVD | lflZf
 vgLZH | ZwPvN | FjpnF
 McHBM | plpcM | hzJzG

Next we need to find, for each row, the character that all three share. We will do this similar to the regexp replace before, but this time we need to find all shared characters between two of them, and then check that returned list against the final one. The setval is crucial here, so we add that to our statement as well:

WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
  SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
  FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
  SELECT *,
    regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
  FROM q)
SELECT left(one,5), common FROM r;
 left  | common
-------+--------
 rBFcr | h
 vgLZH | nn
 McHBM | J

Finally, we simply use our code from before to grab the first character from the common column, assign it a score, and then tally the whole thing up:

WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
  SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
  FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
  SELECT *,
    regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
  FROM q)
,s AS (
  SELECT ascii(common) -
    (CASE WHEN left(common,1) < 'a' COLLATE "C" THEN 38 ELSE 96 END) AS score
  FROM r)
SELECT sum(score) from s;
 sum
-----
  58

All done! Onwards to Day 4...

Avatar for Greg Sabino Mullane

Written by

Greg Sabino Mullane

December 14, 2022 More by this author