Crunchy Bridge for Analytics now has support for Iceberg and other new features. Learn more in our Crunchy Bridge for Analytics announcement.

Tutorial Instructions

# Advent of Code - Day 3

For day 3 of the AOC (Advent of Code), we are given a file in which each line represents items in a rucksack. There are two compartments, divided equally, and we need to find which item is in common for the sides, generate a score based on what character it is, then sum up the scores.

Let’s peek at our data first:

``````SELECT * FROM aoc_day3 limit 10;
``````
``````                      items
-----------------------------------------------
lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
FjpnFRDmbRtnbJrFJmSTsGShWVhGqGVVsmqs
ZwPvNPdzNZwfzBNLdNNNNcLvhnQhqMTVsTGSWSqGqTdVWhMT
vgLZHfvLffNLPbggnrbFpJnCbC
hzJzGjGfqmGtDQtDSvVV
McHBMMcTTHgJnWqnRqjzZnnRzR
ppvsGZhDGprrSjSllwfZ
``````

So a random assortment of strings, with the limitation that all characters are `[a-zA-z]` and there are an even number of characters in each row. Our first step is to divide things into two sections. We will use the built-in SQL functions `length()`, `left()`, and `right()` to accomplish this:

``````SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3;``````
``````-[ RECORD 1 ]--------------------------------------------
items  | lflZfgnSnlmmlgGfjGthQPtLNsQhvbHLLpSS
first  | lflZfgnSnlmmlgGfjG
second | thQPtLNsQhvbHLLpSS
-[ RECORD 2 ]--------------------------------------------
items  | zrCVDVFMJTCTcCJMwCThWbtbpbWpPbtbHPLQssLsHP
first  | zrCVDVFMJTCTcCJMwCThW
second | btbpbWpPbtbHPLQssLsHP
-[ RECORD 3 ]--------------------------------------------
items  | rBFcrwFzFwwVDcDrzTzJfnRGjllBdGZnnZfhqmdn
first  | rBFcrwFzFwwVDcDrzTzJ
second | fnRGjllBdGZnnZfhqmdn
``````

I'm trying to not repeat any functions or actions when possible during this challenge, but I think we can let the double length() slide. Now we need to find letters that are in common to both the first and second sections. This can be done in many ways, but we'll simply replace any letters in the first section that also exists in the second with an empty character, using the `regexp_replace()` function:

``````WITH q AS (
SELECT *, length(items), left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3
)
,r AS (SELECT first, second, regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
SELECT * FROM r;
``````
``````          first           |          second          | common
--------------------------+--------------------------+--------
lflZfgnSnlmmlgGfjG       | thQPtLNsQhvbHLLpSS       | S
zrCVDVFMJTCTcCJMwCThW    | btbpbWpPbtbHPLQssLsHP    | W
rBFcrwFzFwwVDcDrzTzJ     | fnRGjllBdGZnnZfhqmdn     | B
FjpnFRDmbRtnbJrFJm       | STsGShWVhGqGVVsmqs       | mm
ZwPvNPdzNZwfzBNLdNNNNcLv | hnQhqMTVsTGSWSqGqTdVWhMT | dd
vgLZHfvLffNLP            | bggnrbFpJnCbC            | g
hzJzGjGfqm               | GtDQtDSvVV               | GG
McHBMMcTTHgJn            | WqnRqjzZnnRzR            | n
ppvsGZhDGp               | rrSjSllwfZ               | Z
``````

We will use the left() function again to grab a single character, then do a simple mapping to get the score for that character. The puzzle states that `a-z` are worth 1-26 and `A-Z` are worth 27-52. Note that we need to use collation 'C', to ensure our "less than" check works in a case-insensitive manner. The ascii() function has the curious effect of only converting the first letter of any string passed to it, so we pass the duplicated letters directly to it. Finally, we add up all the score to get our final result.

``````WITH q AS (
SELECT *, left(items, length(items)/2) AS first, right(items, length(items)/2) AS second
FROM aoc_day3
)
,r AS (SELECT regexp_replace(first, '[^'||second||']', '', 'g') AS common FROM q)
,s AS (
SELECT ascii(common) -
(case when left(common,1) < 'a' COLLATE "C" THEN 38 else 96 end) AS score
FROM r)
SELECT sum(score) FROM s;
``````
`````` sum
-----
8401``````

This puzzle has a second part. Using the same input, we now need to group the lines into groups of three, and find out which character is in common across all three lines, then generate a score for that character. The first trick is grouping things by threes. We are going to approach this with a windowing function. First, we want to grab out current line, and the values from the two rows "before" it. I changed it to `left(x,5)` just to make the output fit a little better:

``````SELECT left(items,5) AS one,
left( LAG(items,1) over(), 5) AS two,
left( LAG(items,2) over(), 5) AS three
FROM aoc_day3;``````
``````  one  |  two  | three
-------+-------+-------
lflZf |       |
zrCVD | lflZf |
rBFcr | zrCVD | lflZf
FjpnF | rBFcr | zrCVD
ZwPvN | FjpnF | rBFcr
vgLZH | ZwPvN | FjpnF
hzJzG | vgLZH | ZwPvN
plpcM | hzJzG | vgLZH
McHBM | plpcM | hzJzG
ppvsG | McHBM | plpcM
``````

When we get to the third row, we have also captured the two rows before it. In other words, we have groups of three that we can manipulate. The next issue is to only pick every third row. As this is coming from a file via file_fdw, we can be sure that the order stays consistent. We will use a sequence to allow counting by threes and filter out the other two rows in every triad. To be on the safe side, we reset the sequence it case it already exists, to force Postgres to start counting from the number one:

``````CREATE SEQUENCE if not exists aoc;
SELECT setval('aoc', 1, false);

WITH p AS (SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three FROM aoc_day3)
SELECT left(one,5), left(two,5), left(three,5)
FROM p
WHERE 0 = nextval('aoc')%3;
``````
`````` left  | left  | left
-------+-------+-------
rBFcr | zrCVD | lflZf
vgLZH | ZwPvN | FjpnF
McHBM | plpcM | hzJzG
``````

Next we need to find, for each row, the character that all three share. We will do this similar to the regexpreplace before, but this time we need to find all shared characters between two of them, and then check \_that returned list against the final one. The setval is crucial here, so we add that to our statement as well:

``````WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
SELECT *,
regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
FROM q)
SELECT left(one,5), common FROM r;
``````
`````` left  | common
-------+--------
rBFcr | h
vgLZH | nn
McHBM | J
``````

Finally, we simply use our code from before to grab the first character from the common column, assign it a score, and then tally the whole thing up:

``````WITH setup AS (SELECT setval('aoc',1,false))
,p AS (
SELECT items AS one, lag(items,1) over () AS two, lag(items,2) over() AS three
FROM aoc_day3, setup)
,q AS (SELECT * FROM p WHERE 0=nextval('aoc')%3)
,r AS (
SELECT *,
regexp_replace(regexp_replace(one, '[^'||two||']', '','g'), '[^'||three||']', '','g') AS common
FROM q)
,s AS (
SELECT ascii(common) -
(CASE WHEN left(common,1) < 'a' COLLATE "C" THEN 38 ELSE 96 END) AS score
FROM r)
SELECT sum(score) from s;
``````
`````` sum
-----
2641
``````

All done! Onwards to Day 4...