This article will contain spoilers both on how to solve Day 1's
challenge "Calorie Counting" in SQL. Data
has been loaded for you already in a table called calories.
Data Modeling
We have preloaded our data in a table calories with a single text column,
calories_count. This data file used a blank line for a group separator, which
we will need to be aware of in our solution.
Now that we have a table with one row per line, how can we group it? Grouping
similar rows together is a job for window functions. In this case, we need to
somehow put all rows together until we hit an empty line, then we need to start
a new group. Let's create a pseudo-column and use a sequence to increment it
only when we find an empty value in our calories_count column. We will also
call setval() on this column in order to define an initial value so the
currval() function will work.
CREATE SEQUENCE aoc;
SELECT setval('aoc', 1);
SELECT calories_count,
CASE WHEN calories_count = '' THEN nextval('aoc')
ELSE currval('aoc') END
FROM calories LIMIT 10;
calories_count | currval ----------------+--------- 9686 | 1
10178 | 1 3375 | 1 9638 | 1
6318 | 1 4978 | 1 5988 | 1
6712 | 1 | 2 10422 | 2
As we can see, the currval changes when it detects a different group. Let’s
use this property to find out the sum of each group. Note that since this is a
text field, we will need to convert to an int in order to sum these fields.
Additionally, since the blank line cannot convert to an int, we will
explicitly detect when the row we’re looking at is the separator and consider
its value 0.
Group and Sum
SELECT SUM(calories_count) OVER(partition by currval) FROM
(SELECT CASE WHEN calories_count = '' THEN 0
ELSE calories_count :: int END AS calories_count,
CASE WHEN calories_count = '' THEN nextval('aoc')
ELSE currval('aoc') END
FROM calories
) x LIMIT 10;
sum
-------
56873
56873
56873
56873
56873
56873
56873
56873
43456
43456
In this case since we are using a window function this will calculate the sum of those values once for each row, however the total for the group will be the same.
Maximum Value
Note that we don't care which row it came from, nor do we care about the
duplicate entries. All we care about is the maximum value in this new table.
Rather than a subselect, let's
use a CTE to make
things look better. Since one of the goals is to do this in as few SQL
statements as possible, let's put the setval() into the top of the CTE:
WITH setup AS (SELECT setval('aoc',1)),
x AS (SELECT CASE WHEN calories_count = '' THEN 0
ELSE calories_count :: int END AS calories_count,
CASE WHEN calories_count = '' THEN nextval('aoc')
ELSE currval('aoc') END
FROM setup, calories),
y AS (SELECT sum(calories_count) OVER(partition by currval) FROM x)
SELECT max(sum) FROM y;
max
-------
69206
(1 row)
Voila! This returns the highest combined number for all the groups.