Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Tutorial Instructions
This day was a fairly easy one. The tech used this time:
DO
anonymous code blockregexp_split_to_table
and strpos
functionslag()
functionCOUNT(DISTINCT)
as a quick way to check for duplicatesThe goal is to find a certain sequence of letters inside a stream of characters that looks like this:
nfddjzjjjmrjjfttzctzzhqzqbbvhhcfcpcqpcqpccsmsvsswbwzw
The actual string was 4095 characters long. Our job with this challenge is to
find the first point in which four different characters appear. There are a lot
of different ways to solve this. As we are using SQL, I used a quick
CTE (Common Table Expression) to break it into a table in which each row holds a
single character. The incredibly useful regexp_split_to_table
function is
great for this. Once we have one character per row, multiple calls to the
lag()
window function allows us to add characters from the previous three
lines, yielding this:
WITH x AS (SELECT regexp_split_to_table(line, '') AS c FROM aoc_day6)
,y as (SELECT c AS c1, lag(c) over() as c2, lag(c,2) over() as c3, lag(c,3) over () as c4 FROM x)
SELECT * FROM y LIMIT 10;
c1 | c2 | c3 | c4
----+----+----+----
n | | |
f | n | |
d | f | n |
d | d | f | n
j | d | d | f
z | j | d | d
j | z | j | d
j | j | z | j
j | j | j | z
m | j | j | j
(10 rows)
Then it's just a matter of making sure each of the four is unique. All we need
to do is ensure that any of columns 1 through 4 is not the same as the others.
For a small list like this, I just threw it into the WHERE clause. Then we can
find our start-of-packet marker by using strpos
to navigate to the correct
place in the stream right after where the distinct-four characters started:
WITH x AS (SELECT regexp_split_to_table(line, '') AS c FROM aoc_day6)
,y as (SELECT c AS c1, lag(c) over() as c2, lag(c,2) over() as c3, lag(c,3) over () as c4 FROM x)
,z as (
SELECT c4||c3||c2||c1 AS sopm FROM y
WHERE c1<>c2 and c1<>c3 AND c1<>c4 and c2<>c3 AND c3<>c4 AND c2<>c4 LIMIT 1
)
SELECT z.sopm AS "Start of packet marker", 3+strpos(line, sopm) AS value FROM aoc_day6, z;
Start of packet marker | value
------------------------+-------
qgds | 1210
Is this the "best" solution? Probably not. The fastest to run? Nope. But it was the quickest to write. :)
Part two was an extension of the same challenge - rather than 4 unique
characters, we need to look for 14. Obviously, the solution above won't scale
without a crazy amount of conditions in our WHERE clause, so a different
approach is needed. Let's slide through the string, 14 characters at a time,
until we find 14 characters that are unique. For every 14-character block we
find, we will put them into a table with 14 rows, then run COUNT(DISTINCT)
to
find out how many distinct characters are in the string. If there are 14, we
have found our answer.
Looping generally requires either a recursive function, or a language such as
plpgsql. While it is possible to solve many of these solutions using recursive
SQL, they are really painful and annoying to write. A task like this is overkill
for a full plpgsql function, as we only need to run it once. Luckily, Postgres provides a way to do "one off" functions, the DO command. By default, the language used by DO
is plpgsql, but you can specify others. We will use it to loop through and pick a different starting character, and keep looping a crazy number of times, until we find an answer. Today's theme, as you may have guessed, is "quick and dirty" - and DO
commands are great for that! Here is the final SQL:
DO $$
DECLARE
sopm text;
myanswer text;
BEGIN
FOR q IN 1..10000 LOOP
WITH x AS (SELECT substring(line FROM q FOR 14) as c FROM aoc_day6)
,y AS (SELECT regexp_split_to_table(c,'') AS d FROM x)
,z AS (SELECT x.c, COUNT(DISTINCT d) FROM x,y GROUP BY x.c)
SELECT into sopm z.c FROM z WHERE count=14;
IF sopm IS NOT NULL THEN
SELECT INTO myanswer 13+strpos(line, sopm) FROM aoc_day6;
RAISE NOTICE 'Characters before start-of-message marker: %', myanswer;
EXIT;
END IF;
END LOOP;
END
$$;
NOTICE: Characters before start-of-message marker: 3476
Hopefully Day 7 is a little more challenging for a pure Postgres solution (spoiler: it is).
Loading terminal...
Loading terminal...