Announcing Crunchy Bridge for Analytics—interact with your data lake using Postgres commands via extension, with a vectorized, parallel query engine. Learn more in our announcement.

Tutorial Instructions

Advent of Code - Day 15

For the AOC Day 15 challenge, we are presented with an input file that contains the locations of many sensors, along with the closest beacon to that sensor. The first goal is to find out how many beacons CANNOT be in a certain row. The distance from the sensor to the beacon is measured in taxi distance also known as the snake distance or the Manhattan distance. See the Wikipedia entry if you are unfamiliar with it, as it will help to understand this problem. The file consists of lines like these:

 Sensor at x=1555825, y=18926: closest beacon is at x=1498426, y=-85030
 Sensor at x=697941, y=3552290: closest beacon is at x=595451, y=3788543
 Sensor at x=3997971, y=2461001: closest beacon is at x=3951198, y=2418718

As always, we are going to employ a handful of Postgres / SQL tools to help us solve this, including:

The input is in a fairly standard form - we want to first extract all the numbers from each line to eventually put them into a table. This is an excellent job for the regexp_substr() function:

WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15)
,y AS (
  SELECT id
   ,regexp_substr(line, '=(\-?\d+)', 1, 1, '', 1)::int AS s1
   ,regexp_substr(line, '=(\-?\d+)', 1, 2, '', 1)::int AS s2
   ,regexp_substr(line, '=(\-?\d+)', 1, 3, '', 1)::int AS b1
   ,regexp_substr(line, '=(\-?\d+)', 1, 4, '', 1)::int AS b2
FROM x)
,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y)
SELECT * FROM coords LIMIT 10;

However, your version of Postgres may not support regexp_substr (needs to be at least Postgres 15), so older versions will need to use regexp_match instead:

WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15) ,y AS ( SELECT id ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1 ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2 ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1 ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2 FROM x) ,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y) SELECT * FROM coords LIMIT 10;

For both, we use a simple regex to grab the number: =(\-?\d+) The parens specify what part we want to capture, we put a backslash before the minus sign, and use a question mark to make it optional (e.g. 0 or 1 occurrences), and end with a backslash-d to indicate a digit (the plus sign indicates one or more). We then use all four numbers to create the snake distance for each line. The query breaks things into place nicely:

 id |   s1    |   s2    |   b1    |   b2    |  taxi
----+---------+---------+---------+---------+---------
  2 | 1555825 |   18926 | 1498426 |  -85030 |  161355
  3 |  697941 | 3552290 |  595451 | 3788543 |  338743
  4 | 3997971 | 2461001 | 3951198 | 2418718 |   89056
  5 | 3818312 |  282332 | 4823751 | 1061753 | 1784860
  6 | 2874142 | 3053631 | 3074353 | 3516891 |  663471
  7 | 1704479 | 2132468 | 1749091 | 2000000 |  177080
  8 | 3904910 | 2080560 | 3951198 | 2418718 |  384446
  9 |  657061 | 3898803 |  595451 | 3788543 |  171870
 10 | 3084398 | 3751092 | 3074353 | 3516891 |  244246
 11 | 2582061 |  972407 | 1749091 | 2000000 | 1860563
(10 rows)

For part one, we need to only look at row 2,000,000 and determine how many positions there are which cannot have the beacon we are looking for.We basically need to find all places on the line at 2,000,000 that are within the taxi distance of any sensor. For this, we are going to make use of the int4range() function to build a list of line segments at row 2 million, and then filter out the duplicates. First, we use a CASE() statement to only worry about sensors within range. If they are in range, we pull out which columns intersect row 2,000,000 based on the taxi distance for the sensor. If you have a constant like 2 million, that's also a great use of CTEs:

WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15) ,y AS ( SELECT id ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1 ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2 ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1 ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2 FROM x) ,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y) ,numbers AS (SELECT 2000000 AS magicrow) ,inty AS ( SELECT id, s1, s2, taxi, CASE WHEN abs(s2 - magicrow) <= taxi THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]') ELSE int4range(0,0) END AS cannotbe FROM coords, numbers) SELECT * FROM inty limit 10;
 id |   s1    |   s2    |  taxi   |     cannotbe
----+---------+---------+---------+-------------------
 35 | 1555825 |   18926 |  161355 | empty
 36 |  697941 | 3552290 |  338743 | empty
 37 | 3997971 | 2461001 |   89056 | empty
 38 | 3818312 |  282332 | 1784860 | [3751120,3885505)
 39 | 2874142 | 3053631 |  663471 | empty
 40 | 1704479 | 2132468 |  177080 | [1659867,1749092)
 41 | 3904910 | 2080560 |  384446 | [3601024,4208797)
 42 |  657061 | 3898803 |  171870 | empty
 43 | 3084398 | 3751092 |  244246 | empty
 44 | 2582061 |  972407 | 1860563 | [1749091,3415032)
(10 rows)

Now that we have these into a collection of ranges, we can use the awesome range_agg() function to combine all these ranges into a single collection of non-overlapping ranges. We pass that list to unnest to combine it back into a single list of values. In our case, this is a single range, but it could have been many. It looks like this:

WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15) ,y AS ( SELECT id ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1 ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2 ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1 ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2 FROM x) ,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y) ,numbers AS (SELECT 2000000 AS magicrow) ,inty AS ( SELECT id, s1, s2, taxi, CASE WHEN abs(s2 - magicrow) <= taxi THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]') ELSE int4range(0,0) END AS cannotbe FROM coords, numbers) ,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty) SELECT * FROM aggy;
         r
-------------------
 [-671176,4208797)
(1 row)

At this point, we are ready to solve the puzzle, so we extract all the non-overlapping ranges, figure out how "wide" they are by subtracting their lower bounds from their upper bounds, then add those all up together to get our answer. We also subtract one, as our final combined range is inclusive on one end only.

WITH x AS (SELECT nextval('aoc') as id, line from aoc_day15) ,y AS ( SELECT id ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1 ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2 ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1 ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2 FROM x) ,coords AS (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y) ,numbers AS (SELECT 2000000 AS magicrow) ,inty AS ( SELECT id, s1, s2, taxi, CASE WHEN abs(s2 - magicrow) <= taxi THEN int4range(s1 - (taxi-abs(s2-magicrow)), s1 + taxi-(abs(s2-magicrow)), '[]') ELSE int4range(0,0) END AS cannotbe FROM coords, numbers) ,aggy AS (SELECT unnest(range_agg(cannotbe)) AS r FROM inty) ,rangesums AS (SELECT upper(r)-lower(r) AS mysum FROM aggy) SELECT sum( upper(r)-lower(r) ) -1 AS partone FROM aggy;
 partone
---------
 4879972
(1 row)

Part Two gets a lot more challenging. We are tasked with finding the spot on the grid that is not within range of any sensor. In other words, there is a spot on the grid somewhere that is not within the taxi range of any of our rows. Sounds pretty simple, right? Yes...but only up to a point. At first I tried a brute force solution of walking through the grid, and then walking through all the sensor ranges, but both of those were quickly ruled out due to the size of the grid. As an example, here is one of the lines from the input file. Look closely at the size of the "diamond" that is generated by the sensor expanding out to the taxi distance:

-- Sensor at x=2582061, y=972407: closest beacon is at x=1749091, y=2000000

 id |   s1    |   s2    |    b1    |   b2    |  taxi
----+---------+---------+----------+---------+---------
 11 | 2582061 |  972407 |  1749091 | 2000000 | 1860563

That diamond has a radius of 1.8 million! So this grid is entirely too large to do anything resembling a brute force approach. Not going to lie here - I struggled with this a little, until I found a good approach. The key is to realize that any open space must be bordered by more than one sensor "edge". So, we can walk the edges of each sensor (sensor plus taxi distance along the whole diamond), and look for any spot just outside the edge that is NOT in range of any other sensor. Still not cheap to do in SQL, but way better than any alternative!

Once again, we are going to put things into a grid table with rows and columns. All we care about is the location of the sensors: now that we have the taxi distance, we can discard the beacon's locations. So we will store the location of each sensor, and its taxi distance. Using the SQL above, we can quickly populate the table:

CREATE UNLOGGED TABLE sensorfarm(col int, row int, taxi int, item text); CREATE UNIQUE INDEX aa on sensorfarm(col,row); WITH x AS (SELECT nextval('aoc') AS id, line FROM aoc_day15) ,y AS ( SELECT id ,(regexp_match(line, 'x=(\-?\d+)'))[1]::int AS s1 ,(regexp_match(line, 'y=(\-?\d+)'))[1]::int AS s2 ,(regexp_match(line, 'beacon is at x=(\-?\d+)'))[1]::int AS b1 ,(regexp_match(line, 'y=(\-?\d+)$'))[1]::int AS b2 FROM x) ,coords as (SELECT *, abs(s1-b1) + abs(s2-b2) AS taxi FROM y) INSERT into sensorfarm(col,row,taxi) select s1,s2,taxi FROM coords;

At one point, I tried using arrays, but they were slow and ran into some Postgres limits:

ERROR:  array size exceeds the maximum allowed (1073741823)

For the final function, I was able to implement the logic above: walk the outside of the diamond, rule out things that are out of bounds (the problem says it will be in the zero to four million row/column range), and check all the other diamonds for overlap. The full function is at the bottom. Let's look at it section by section:

CREATE or replace FUNCTION diamond_walker()
  returns void language plpgsql as $$
DECLARE
  maxrange int = 4000000;
  myrec record;
  item int = 0;
  mysensor record;
  current_top_row int; current_bottom_row int;
  current_left_col int; current_right_col int;
  northwest bool; northeast bool; southwest bool; southeast bool;
  oob int = 0;
BEGIN

  FOR myrec IN
    SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
  LOOP

This is a standard plpgsql function, which takes no arguments, and returns null, as we are going to throw an exception when we find the value we are looking for. We DECLARE all of the variables we are going to use, then start the main loop, in which we iterate over each sensor in our sensorfarm table. The row_number is not needed to solve things, but helps give us some nice output as we are searching.

  current_top_row = myrec.row;
  current_bottom_row = myrec.row;
  current_left_col = myrec.col - (myrec.taxi + 1);
  current_right_col = myrec.col + (myrec.taxi + 1);

We are going to walk the outside of the diamond formed by the sensor and its taxi distance. We start with the same row as the sensor, but shoot out bot columns as wide as the taxi distance - in other words, the widest part of the diamond.

-- Walk the outside of the diamond for this sensor
  FOR t IN 1 .. myrec.taxi+1 LOOP
    -- Every once in a while, give a status report
    IF (item % 10000) = 0 THEN
      RAISE NOTICE 
        '%Trying sensor #% at % / % taxi is %  [% candidates]  [OOB: %]', E'\033c',
        myrec.row_number,
        to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
        to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
        to_char(oob, 'FM999,999,999');
    END IF;

    item = item + 1;

Now we start the loop in which we will walk the outside of the diamond, checking each point along the way. As this can take a very long time to run, I added some output so that clears the screen (that magic 033c code, then does some pretty output showing which row we are on, and some totals on about how many places we have checked, and how many were declared out of bounds (explained below).

    -- There are cases where the whole diamond has wandered out of range
    -- If that happens, we abandon the entire loop
    if current_left_col > maxrange
      or current_right_col < 0
      or current_top_row > maxrange
      or current_bottom_row < 0
      THEN exit;
    END IF;

It may be that our diamond is no longer in the range that we are considering, so we have a short-circuit here to simply abandon this entire sensor if that should happen.

    -- Set each direction as not found
    northwest = false; northeast = false; southwest = false; southeast = false;

    -- Skip if this candidate is out of range
    IF current_left_col < 0
      THEN northwest = true; southwest = true; oob = oob + 1;
    END IF;
    IF current_right_col > maxrange
      THEN northeast = true; southeast = true; oob = oob + 1;
    END IF;
    IF current_top_row < 0
      THEN northwest = true; northeast = true; oob = oob + 1;
    END IF;
    IF current_bottom_row > maxrange
      THEN southwest = true; southeast = true; oob = oob + 1;
    END IF;

We are going to track each of the four edges of the diamond: northeast, northwest, southeast, and southwest. These variables are all booleans, and start as false. If we find something that disqualifies them as a potential candidate for "winning", we mark it as true. To start, we do a quick check to see if any of the four edges are out of range. If they are, we mark the appropriate boolean and also increment our out of bounds variable.

    -- Do any other diamonds contain this spot?
    FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
      -- Does this sensor cover our "northwest" pixel?
      IF not northwest AND abs(mysensor.col - current_left_col) 
           + abs(mysensor.row - current_top_row) <= mysensor.taxi
        THEN northwest = true;
      END IF;
      -- Does this sensor cover our "northeast" pixel?
      IF not northeast AND abs(mysensor.col-current_right_col) 
           + abs(mysensor.row-current_top_row) <= mysensor.taxi
        THEN northeast = true;
      END IF;
      -- Does this sensor cover our "southwest" pixel?
      IF not southwest AND abs(mysensor.col - current_left_col) 
           + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southwest = true;
      END IF;
      -- Does this sensor cover our "southeast" pixel?
      IF not southeast AND abs(mysensor.col - current_right_col) 
           + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southeast = true;
      END IF;
      -- Stop checking other sensors if all four directions are covered
      IF northwest and northeast and southwest and southeast
        THEN exit;
      END IF;
    END LOOP;

This bit of code is the meat of our function. We want to iterate through all of the other sensors to see if, for each of our edges, that sensor overlaps the spot we are looking at. Note that we use the special ctid column (the physical location of the row, more or less) to exclude comparing our current sensor to itself. If we find that another sensor has covered any of the four edges, we immediately exit the loop as we know this spot is not a winner.

    -- If any of our four directions had no coverage, they are the winner
   IF not northeast or not northwest or not southeast or not southwest
     THEN
     mysensor.col = CASE when northeast or southeast
       THEN current_right_col else current_left_col END;
     mysensor.row = CASE when northeast or northwest
       THEN current_top_row else current_bottom_row END;
     RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
       myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
       mysensor.col * maxrange::bigint + mysensor.row;
   END IF;

Finally, we can check if we won, as indicated by the fact that one of our edges is still false. If so, we throw an exception and give the information about the winning spot.

    current_left_col = current_left_col + 1;
    current_right_col = current_right_col - 1;
    current_top_row = current_top_row - 1;
    current_bottom_row = current_bottom_row + 1;

  END LOOP; -- outside of the diamond

END LOOP; -- each sensor

return; END; $$;

The final bit of the function is just cleanup. We move our current positions up, down, left, and right, as we continue to walk the edge of the diamond. Then we end the diamond loop, the sensor, loop, and the function ends. Here is what it looks like as it runs:

And here is the final output, showing the correct solution to part two:

select diamond_walker();

Trying sensor #21 at 3,505,508 / 2,805,537 taxi is 532,165  [4,300,000 candidates]  [OOB: 1,865,312]
ERROR:  Winner from edges of sensor at 3505508,2805537. Spot 3879585,2647448 has a value of 15518342647448
CONTEXT:  PL/pgSQL function diamond_walker() line 92 at RAISE

The complete function:

CREATE or replace FUNCTION diamond_walker()
  returns void language plpgsql as $$
DECLARE
  maxrange int = 4000000;
  myrec record;
  item int = 0;
  mysensor record;
  current_top_row int; current_bottom_row int;
  current_left_col int; current_right_col int;
  northwest bool; northeast bool; southwest bool; southeast bool;
  oob int = 0;
BEGIN

  FOR myrec IN
    SELECT row_number() OVER (ORDER BY taxi), row, col, taxi FROM sensorfarm
  LOOP

  current_top_row = myrec.row;
  current_bottom_row = myrec.row;
  current_left_col = myrec.col - (myrec.taxi + 1);
  current_right_col = myrec.col + (myrec.taxi + 1);

  /* Walk the outside of the diamond */
  FOR t IN 1 .. myrec.taxi+1 LOOP
    /* Every once in a while, give a status report */
    IF (item % 10000) = 0 THEN
      RAISE NOTICE 
      '%Trying sensor #% at % / % taxi is %  [% candidates]  [OOB: %]', E'\033c',
        myrec.row_number,
        to_char(myrec.col, 'FM999,999,999'), to_char(myrec.row, 'FM9,999,999'),
        to_char(myrec.taxi, 'FM9,999,999'), to_char(item, 'FM999,999,999'),
        to_char(oob, 'FM999,999,999');
    END IF;

    item = item + 1;

    /* There are cases where the whole diamond has wandered out of range
       If that happens, we abandon the entire loop */
    if current_left_col > maxrange
      or current_right_col < 0
      or current_top_row > maxrange
      or current_bottom_row < 0
      THEN exit;
    END IF;

    /* Set each direction as not found */
    northwest = false; northeast = false; southwest = false; southeast = false;

    /* Skip if this candidate is out of range */
    IF current_left_col < 0
      THEN northwest = true; southwest = true; oob = oob + 1;
    END IF;
    IF current_right_col > maxrange
      THEN northeast = true; southeast = true; oob = oob + 1;
    END IF;
    IF current_top_row < 0
      THEN northwest = true; northeast = true; oob = oob + 1;
    END IF;
    IF current_bottom_row > maxrange
      THEN southwest = true; southeast = true; oob = oob + 1;
    END IF;

    /* Do any other diamonds contain this spot? */
    FOR mysensor in SELECT col,row,taxi FROM sensorfarm WHERE ctid <> myrec.ctid LOOP
      /* Does this sensor cover our "northwest" pixel? */
      IF not northwest AND abs(mysensor.col - current_left_col) 
           + abs(mysensor.row - current_top_row) <= mysensor.taxi
        THEN northwest = true;
      END IF;
      /* Does this sensor cover our "northeast" pixel? */
      IF not northeast AND abs(mysensor.col-current_right_col) 
           + abs(mysensor.row-current_top_row) <= mysensor.taxi
        THEN northeast = true;
      END IF;
      /* Does this sensor cover our "southwest" pixel? */
      IF not southwest AND abs(mysensor.col - current_left_col) 
           + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southwest = true;
      END IF;
      /* Does this sensor cover our "southeast" pixel? */
      IF not southeast AND abs(mysensor.col - current_right_col) 
           + abs(mysensor.row - current_bottom_row) <= mysensor.taxi
        THEN southeast = true;
      END IF;
      /* Stop checking other sensors if all four directions are covered */
      IF northwest and northeast and southwest and southeast
        THEN exit;
      END IF;
    END LOOP;

    /* If any of our four directions had no coverage, they are the winner */
    IF not northeast or not northwest or not southeast or not southwest
      THEN
      mysensor.col = CASE when northeast or southeast
        THEN current_right_col else current_left_col END;
      mysensor.row = CASE when northeast or northwest
        THEN current_top_row else current_bottom_row END;
      RAISE 'Winner from row %, edges of sensor at %,%. Spot %,% has a value of %',
        myrec.row_number, myrec.col, myrec.row, mysensor.col, mysensor.row,
        mysensor.col * maxrange::bigint + mysensor.row;
    END IF;

    current_left_col = current_left_col + 1;
    current_right_col = current_right_col - 1;
    current_top_row = current_top_row - 1;
    current_bottom_row = current_bottom_row + 1;

  END LOOP; /* outside of the diamond */

END LOOP; /* each sensor */

return; END; $$;

This function is preloaded, so if you are brave enough, and have lots of time on your hands, you can try it out:

SELECT diamond_walker();

That one was a doozy, but we got it solved! Stay tuned for more solutions soon.

Loading terminal...

Loading terminal...