How to Solve Advent of Code 2022 Using Postgres...

This article will contain spoilers both on how I solved 2022 Day 15's challenge "Beacon Exclusion Zone" using SQL, as well as general ideas on how to approach the problem. I recommend trying to solve it yourself first, using your favorite language. This article is delayed from the actual puzzle's release. Also note that my solutions are seldom going to be the "best" solutions - they were solved as quickly as possible, and these articles will show my first solutions, with some minor reformatting and cleaning up.

Hands on Tutorial

We've also loaded a tutorial for Day 15's challenge if you want to try it with a pre-loaded data set.

AOC Day 15

For this day's challenge, we are presented with an input file that contains the locations of many sensors, along with the closest beacon to that sensor. The first goal is to find out how many beacons CANNOT be in a certain row. The distance from the sensor to the beacon is measured in taxi distance also known as the snake distance or the Manhattan distance. See the Wikipedia entry if you are unfamiliar with it, as it will help to understand this problem. The file consists of lines like these:

 Sensor at x=1555825, y=18926: closest beacon is at x=1498426, y=-85030
 Sensor at x=697941, y=3552290: closest beacon is at x=595451, y=3788543
 Sensor at x=3997971, y=2461001: closest beacon is at x=3951198, y=2418718

As always, we are going to employ a handful of Postgres / SQL tools to help us solve this, including:

file_fdw to read text files in from the system
sequences to help us do some basic counting
The regexp_substr and regexp_match functions to help extract numbers from a text string
CTEs (Common Table Expressions) to keep our queries readable and not descending into sub-select chaos.
The int4range range type, to efficiently gather together a lot of numbers along a line
The range_agg and unnest functions to help us sort the int4range output
The ctid system column as a quick way to exclude the "current" row from a list of rows
PL/pgSQL functions and some upsert logic

First things first, we need to put everything into a schema for neatness, and use a FDW / foreign table to get the flat file of our inputs into the database:

CREATE EXTENSION if not exists file_fdw;

CREATE SERVER if not exists aoc2022 FOREIGN DATA WRAPPER file_fdw;

DROP SCHEMA if exists aoc_beacon CASCADE;
CREATE SCHEMA aoc_beacon;
SET search_path = aoc_beacon;

DROP FOREIGN TABLE if exists aoc_day15;

CREATE FOREIGN TABLE aoc_day15 (line text)
  SERVER aoc2022 options(filename '/tmp/aoc2022.day15.input');

DROP SEQUENCE if exists aoc;
CREATE SEQUENCE aoc;

The input file is a fairly standard form - we want to first extract all the numbers from each line to eventually put them into a table. This is an excellent job for the regexp_substr() function:

WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
  SELECT id
   ,regexp_substr(line, '=(\-?\d+)', 1, 1, '', 1)::int AS s1
   ,regexp_substr(line, '=(\-?\d+)', 1, 2, '', 1)::int AS s2
   ,regexp_substr(line, '=(\-?\d+)', 1, 3, '', 1)::int AS b1
   ,regexp_substr(line, '=(\-?\d+)', 1, 4, '', 1)::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
SELECT * FROM coords LIMIT 10;

However, your version of Postgres may not support regexp_substr (needs to be at least Postgres 15), so older versions will need to use regexp_match instead:

WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
  SELECT id
   ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
   ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
   ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
   ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
SELECT * FROM coords LIMIT 10;

For both, we use a simple regex to grab the number: =(\-?\d+) The parens specify what part we want to capture, we put a backslash before the minus sign, and use a question mark to make it optional (e.g. 0 or 1 occurrences), and end with a backslash-d to indicate a digit (the plus indicates one or more). We then use all four numbers to create the snake distance for each line. The query breaks things into place nicely:

 id |   s1    |   s2    |   b1    |   b2    |  taxi
----+---------+---------+---------+---------+---------
  2 | 1555825 |   18926 | 1498426 |  -85030 |  161355
  3 |  697941 | 3552290 |  595451 | 3788543 |  338743
  4 | 3997971 | 2461001 | 3951198 | 2418718 |   89056
  5 | 3818312 |  282332 | 4823751 | 1061753 | 1784860
  6 | 2874142 | 3053631 | 3074353 | 3516891 |  663471
  7 | 1704479 | 2132468 | 1749091 | 2000000 |  177080
  8 | 3904910 | 2080560 | 3951198 | 2418718 |  384446
  9 |  657061 | 3898803 |  595451 | 3788543 |  171870
 10 | 3084398 | 3751092 | 3074353 | 3516891 |  244246
 11 | 2582061 |  972407 | 1749091 | 2000000 | 1860563
(10 rows)

For part one, we need to only look at row 2,000,000 and determine how many positions there are which cannot have the beacon we are looking for. We basically need to find all places on the line at 2,000,000 that are within the taxi distance of any sensor. For this, we are going to make use of the int4range() function to build a list of line segments within row 2 million, and then filter out the duplicates. First, we use a CASE() statement to only worry about sensors within range. If they are in range, we pull out which columns intersect row 2,000,000 based on the taxi distance for the sensor. If you have a constant like 2 million, that's also a great use of CTEs:

WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
   ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
   ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
   ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
   ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
  SELECT id, s1, s2, taxi,
  CASE WHEN abs(s2 - magicrow) <= taxi
    THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
    ELSE int4range(0,0)
  END AS cannotbe
FROM coords, numbers)
SELECT * FROM inty limit 10;

 id |   s1    |   s2    |  taxi   |     cannotbe
----+---------+---------+---------+-------------------
 35 | 1555825 |   18926 |  161355 | empty
 36 |  697941 | 3552290 |  338743 | empty
 37 | 3997971 | 2461001 |   89056 | empty
 38 | 3818312 |  282332 | 1784860 | [3751120,3885505)
 39 | 2874142 | 3053631 |  663471 | empty
 40 | 1704479 | 2132468 |  177080 | [1659867,1749092)
 41 | 3904910 | 2080560 |  384446 | [3601024,4208797)
 42 |  657061 | 3898803 |  171870 | empty
 43 | 3084398 | 3751092 |  244246 | empty
 44 | 2582061 |  972407 | 1860563 | [1749091,3415032)
(10 rows)

Now that we have these into a collection of ranges, we can use the awesome range_agg() function to combine all these ranges into a single collection of non-overlapping ranges. We pass that list to unnest to combine it back into a single list of values. In our case, this is a single range, but it could have been many. It looks like this:

WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
   ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
   ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
   ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
   ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
  SELECT id, s1, s2, taxi,
  CASE WHEN abs(s2 - magicrow) <= taxi
    THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
    ELSE int4range(0,0)
  END AS cannotbe
FROM coords, numbers)
,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty)
SELECT * FROM aggy;

         r
-------------------
 [-671176,4208797)
(1 row)

At this point, we are ready to solve the puzzle, so we extract all the non-overlapping ranges, figure out how "wide" they are by subtracting their lower bounds from their upper bounds, then add those all up together to get our answer. We also subtract one, as our final combined range is inclusive on one end only.

WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15)
,y AS (
SELECT id
   ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
   ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
   ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
   ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
,numbers AS (SELECT 2000000 AS magicrow)
,inty AS (
  SELECT id, s1, s2, taxi,
  CASE WHEN abs(s2 - magicrow) <= taxi
    THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]')
    ELSE int4range(0,0)
  END AS cannotbe
FROM coords, numbers)
,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty)
,rangesums AS (SELECT upper(r)-lower(r) AS mysum FROM aggy)
SELECT sum( upper(r)-lower(r) ) -1 AS partone FROM aggy;

 partone
---------
 4879972
(1 row)

Part two gets a lot more challenging. We are tasked with finding the spot on the grid that is not within range of any sensor. In other words, there is a spot on the grid somewhere that is not within the taxi range of any of our rows. Sounds pretty simple, right? Yes...but only up to a point. At first I tried a brute force solution of walking through the grid, and then walking through all the sensor ranges, but both of those were quickly ruled out due to the size of the grid. As an example, here is one of the lines from the input file. Look closely at the size of the "diamond" that is generated by the sensor expanding out to the taxi distance:

-- Sensor at x=2582061, y=972407: closest beacon is at x=1749091, y=2000000

 id |   s1    |   s2    |    b1    |   b2    |  taxi
----+---------+---------+----------+---------+---------
 11 | 2582061 |  972407 |  1749091 | 2000000 | 1860563

That diamond has a radius of 1.8 million! So this grid is entirely too large to do anything resembling a brute force approach. Not going to lie here - I struggled with this a little, until I found a good approach. The key is to realize that any open space must be bordered by more than one sensor "edge". So, we can walk the edges of each sensor (sensor plus taxi distance along the whole diamond), and look for any spot just outside the edge that is NOT in range of any other sensor. Still not cheap to do in SQL, but way better than any alternative!

Once again, we are going to put things into a grid table with rows and columns. All we care about is the location of the sensors: now that we have the taxi distance, we can discard the beacon's locations. So we will store the location of each sensor, and its taxi distance. Using the SQL above, we can quickly populate the table:

CREATE UNLOGGED TABLE sensorfarm(col int, row int, taxi int, item text);
CREATE UNIQUE INDEX aa on sensorfarm(col,row);

WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
  SELECT id
   ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1
   ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2
   ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1
   ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
INSERT into sensorfarm(col,row,taxi) select s1,s2,taxi FROM coords;

At one point, I tried using arrays, but they were slow and ran into some Postgres limits:

ERROR:  array size exceeds the maximum allowed (1073741823)

For the final function, I was able to implement the logic above: walk the outside of the diamond, rule out things that are out of bounds (the problem says it will be in the zero to four million row/column range), and check all the other diamonds for overlap. The full function is at the bottom. Let's look at it section by section:

CREATE or replace FUNCTION diamond_walker()
  returns void language plpgsql as $$
DECLARE
  maxrange int = 4000000;
  myrec record;
  item int = 0;
  mysensor record;
  current_top_row int; current_bottom_row int;
  current_left_col int; current_right_col int;
  northwest bool; northeast bool; southwest bool; southeast bool;
  oob int = 0;
BEGIN

  FOR myrec IN
    SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
  LOOP
...

This is a standard PL/pgSQL function, which takes no arguments, and returns null, as we are going to throw an exception when we find the value we are looking for. We DECLARE all of the variables we are going to use, then start the main loop, in which we iterate over each sensor in our sensorfarm table. The row_number is not needed to solve things, but helps give us some nice output as we are searching.

  current_top_row = myrec.row;
  current_bottom_row = myrec.row;
  current_left_col = myrec.col - (myrec.taxi + 1);
  current_right_col = myrec.col + (myrec.taxi + 1);

We are going to walk the outside of the diamond formed by the sensor and its taxi distance. We start with the same row as the sensor, but shoot out bot columns as wide as the taxi distance - in other words, the widest part of the diamond.

-- Walk the outside of the diamond for this sensor
  FOR t IN 1 .. myrec.taxi+1 LOOP
    -- Every once in a while, give a status report
    IF (item % 10000) = 0 THEN
      RAISE NOTICE '%Trying sensor #% at % / % taxi is %  [% candidates]  [OOB: %]', E'\033c',
        myrec.row_number,
        to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
        to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
        to_char(oob, 'FM999,999,999');
    END IF;

    item = item + 1;

Now we start the loop in which we will walk the outside of the diamond, checking each point along the way. As this can take a very long time to run, I added some output so that clears the screen (that magic 033c code), then does some pretty output showing which row we are on, and some totals on about how many places we have checked, and how many were declared out of bounds (explained below).

    -- There are cases where the whole diamond has wandered out of range
    -- If that happens, we abandon the entire loop
    if current_left_col > maxrange
      or current_right_col < 0
      or current_top_row > maxrange
      or current_bottom_row < 0
      THEN exit;
    END IF;

It may be that our diamond is no longer in the range that we are considering, so we have a short-circuit here to simply abandon this entire sensor if that should happen.

    -- Set each direction as not found
    northwest = false; northeast = false; southwest = false; southeast = false;

    -- Skip if this candidate is out of range
    IF current_left_col < 0
      THEN northwest = true; southwest = true; oob = oob + 1;
    END IF;
    IF current_right_col > maxrange
      THEN northeast = true; southeast = true; oob = oob + 1;
    END IF;
    IF current_top_row < 0
      THEN northwest = true; northeast = true; oob = oob + 1;
    END IF;
    IF current_bottom_row > maxrange
      THEN southwest = true; southeast = true; oob = oob + 1;
    END IF;

We are going to track each of the four edges of the diamond: northeast, northwest, southeast, and southwest. These variables are all booleans, and start as false. If we find something that disqualifies them as a potential candidate for "winning", we mark it as true. To start, we do a quick check to see if any of the four edges are out of range. If they are, we mark the appropriate boolean and also increment our out of bounds variable.

    -- Do any other diamonds contain this spot?
    FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
      -- Does this sensor cover our "northwest" pixel?
      IF not northwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_top_row) <= mysensor.taxi
        THEN northwest = true;
      END IF;
      -- Does this sensor cover our "northeast" pixel?
      IF not northeast AND abs(mysensor.col-current_right_col) + abs(mysensor.row-current_top_row) <= mysensor.taxi
        THEN northeast = true;
      END IF;
      -- Does this sensor cover our "southwest" pixel?
      IF not southwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southwest = true;
      END IF;
      -- Does this sensor cover our "southeast" pixel?
      IF not southeast AND abs(mysensor.col - current_right_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southeast = true;
      END IF;
      -- Stop checking other sensors if all four directions are covered
      IF northwest and northeast and southwest and southeast
        THEN exit;
      END IF;
    END LOOP;

This bit of code is the meat of our function. We want to iterate through all of the other sensors to see if, for each of our edges, that sensor overlaps the spot we are looking at. Note that we use the special ctid column (the physical location of the row, more or less) to exclude comparing our current sensor to itself. If we find that another sensor has covered any of the four edges, we immediately exit the loop as we know this spot is not a winner.

    -- If any of our four directions had no coverage, they are the winner
   IF not northeast or not northwest or not southeast or not southwest
     THEN
     mysensor.col = CASE when northeast or southeast then current_right_col else current_left_col END;
     mysensor.row = CASE when northeast or northwest then current_top_row else current_bottom_row END;
     RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
       myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
       mysensor.col * maxrange::bigint + mysensor.row;
   END IF;

Finally, we can check if we won, as indicated by the fact that one of our edges is still false. If so, we throw an exception and give the information about the winning spot.

    current_left_col = current_left_col + 1;
    current_right_col = current_right_col - 1;
    current_top_row = current_top_row - 1;
    current_bottom_row = current_bottom_row + 1;

  END LOOP; -- outside of the diamond

END LOOP; -- each sensor

return; END; $$;

The final bit of the function is just cleanup. We move our current positions up, down, left, and right, as we continue to walk the edge of the diamond. Then we end the diamond loop, the sensor, loop, and the function ends. Here is what it looks like as it runs:

aoc_diamond_walker

And here is the final output, showing the correct solution to part two:

select diamond_walker();

Trying sensor #21 at 3,505,508 / 2,805,537 taxi is 532,165  [4,300,000 candidates]  [OOB: 1,865,312]
ERROR:  Winner from edges of sensor at 3505508,2805537. Spot 3879585,2647448 has a value of 15518342647448
CONTEXT:  PL/pgSQL function diamond_walker() line 92 at RAISE

The complete function:

CREATE or replace FUNCTION diamond_walker()
  returns void language plpgsql as $$
DECLARE
  maxrange int = 4000000;
  myrec record;
  item int = 0;
  mysensor record;
  current_top_row int; current_bottom_row int;
  current_left_col int; current_right_col int;
  northwest bool; northeast bool; southwest bool; southeast bool;
  oob int = 0;
BEGIN

  FOR myrec IN
    SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
  LOOP

  current_top_row = myrec.row;
  current_bottom_row = myrec.row;
  current_left_col = myrec.col - (myrec.taxi + 1);
  current_right_col = myrec.col + (myrec.taxi + 1);

  -- Walk the outside of the diamond
  FOR t IN 1 .. myrec.taxi+1 LOOP
    -- Every once in a while, give a status report
    IF (item % 10000) = 0 THEN
      RAISE NOTICE '%Trying sensor #% at % / % taxi is %  [% candidates]  [OOB: %]', E'\033c',
        myrec.row_number,
        to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
        to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
        to_char(oob, 'FM999,999,999');
    END IF;

    item = item + 1;

    -- There are cases where the whole diamond has wandered out of range
    -- If that happens, we abandon the entire loop
    if current_left_col > maxrange
      or current_right_col < 0
      or current_top_row > maxrange
      or current_bottom_row < 0
      THEN exit;
    END IF;

    -- Set each direction as not found
    northwest = false; northeast = false; southwest = false; southeast = false;

    -- Skip if this candidate is out of range
    IF current_left_col < 0
      THEN northwest = true; southwest = true; oob = oob + 1;
    END IF;
    IF current_right_col > maxrange
      THEN northeast = true; southeast = true; oob = oob + 1;
    END IF;
    IF current_top_row < 0
      THEN northwest = true; northeast = true; oob = oob + 1;
    END IF;
    IF current_bottom_row > maxrange
      THEN southwest = true; southeast = true; oob = oob + 1;
    END IF;

    -- Do any other diamonds contain this spot?
    FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
      -- Does this sensor cover our "northwest" pixel?
      IF not northwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_top_row) <= mysensor.taxi
        THEN northwest = true;
      END IF;
      -- Does this sensor cover our "northeast" pixel?
      IF not northeast AND abs(mysensor.col-current_right_col) + abs(mysensor.row-current_top_row) <= mysensor.taxi
        THEN northeast = true;
      END IF;
      -- Does this sensor cover our "southwest" pixel?
      IF not southwest AND abs(mysensor.col - current_left_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southwest = true;
      END IF;
      -- Does this sensor cover our "southeast" pixel?
      IF not southeast AND abs(mysensor.col - current_right_col) + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southeast = true;
      END IF;
      -- Stop checking other sensors if all four directions are covered
      IF northwest and northeast and southwest and southeast
        THEN exit;
      END IF;
    END LOOP;

    -- If any of our four directions had no coverage, they are the winner
    IF not northeast or not northwest or not southeast or not southwest
      THEN
      mysensor.col = CASE when northeast or southeast then current_right_col else current_left_col END;
      mysensor.row = CASE when northeast or northwest then current_top_row else current_bottom_row END;
      RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
        myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
        mysensor.col * maxrange::bigint + mysensor.row;
    END IF;

    current_left_col = current_left_col + 1;
    current_right_col = current_right_col - 1;
    current_top_row = current_top_row - 1;
    current_bottom_row = current_bottom_row + 1;

  END LOOP; -- outside of the diamond

END LOOP; -- each sensor

return; END; $$;

That one was a doozy, but we got it solved! Stay tuned for more solutions soon.

Latest Articles

How to Solve Advent of Code 2022 Using Postgres - Day 15

Hands on Tutorial

AOC Day 15

Related Articles