This article will contain spoilers both on how to solve Day 1's challenge "Calorie Counting" in SQL. Data has been loaded for you already in a table called
We have preloaded our data in a table
calories with a single text column,
calories_count. This data file used a blank line for a group separator, which we will need to be aware of in our solution.
Now that we have a table with one row per line, how can we group it? Grouping similar rows together is a job for window functions. In this case, we need to somehow put all rows together until we hit an empty line, then we need to start a new group. Let's create a pseudo-column and use a sequence to increment it only when we find an empty value in our
calories_count column. We will also call
setval() on this column in order to define an initial value so the
currval() function will work.
CREATE SEQUENCE aoc; SELECT setval('aoc', 1); SELECT calories_count, CASE WHEN calories_count = '' THEN nextval('aoc') ELSE currval('aoc') END FROM calories LIMIT 10;
calories_count | currval ----------------+--------- 9686 | 1 10178 | 1 3375 | 1 9638 | 1 6318 | 1 4978 | 1 5988 | 1 6712 | 1 | 2 10422 | 2
As we can see, the
currval changes when it detects a different group. Let’s use this property to find out the sum of each group. Note that since this is a
text field, we will need to convert to an
int in order to sum these fields. Additionally, since the blank line cannot convert to an
int, we will explicitly detect when the row we’re looking at is the separator and consider its value
SELECT SUM(calories_count) OVER(partition by currval) FROM (SELECT CASE WHEN calories_count = '' THEN 0 ELSE calories_count :: int END AS calories_count, CASE WHEN calories_count = '' THEN nextval('aoc') ELSE currval('aoc') END FROM calories ) x LIMIT 10;
sum ------- 56873 56873 56873 56873 56873 56873 56873 56873 43456 43456
In this case since we are using a window function this will calculate the sum of those values once for each row, however the total for the group will be the same.
Note that we don't care which row it came from, nor do we care about the duplicate entries. All we care about is the maximum value in this new table. Rather than a subselect, let's use a CTE to make things look better. Since one of the goals is to do this in as few SQL statements as possible, let's put the
setval() into the top of the CTE:
WITH setup AS (SELECT setval('aoc',1)), x AS (SELECT CASE WHEN calories_count = '' THEN 0 ELSE calories_count :: int END AS calories_count, CASE WHEN calories_count = '' THEN nextval('aoc') ELSE currval('aoc') END FROM setup, calories), y AS (SELECT sum(calories_count) OVER(partition by currval) FROM x) SELECT max(sum) FROM y;
max ------- 69206 (1 row)
Voila! This returns the highest combined number for all the groups.