Tutorial Instructions

Using FILTER vs CTEs and CASE WHEN

For developers who have been around SQL long enough, they know that there is often more than one way to get the same result. So, for this tutorial, we will generate a common report in SQL with three different approaches. Experts may use any of these approaches, depending on the situation. Generally, our recommendation is: choose the approach that makes your SQL readable. First, we will use a common table express (CTE), which is a nice method or organizing subqueries. Second, we use CASE statements within aggregation context to filter out the values we want. Lastly, we use FILTER to clean up the syntax, but effectively do the same as the CASE statement. Each of these approaches can work just fine, but in this scenario, using FILTER keeps the SQL cleaner.

The report we're going to work to generate is a monthly report of revenue from an invoices table. We'll want our end report to look something like:

    mnth    | billed  | uncollected |  collected
------------+-----------------------+--------------+--
 2023-02-01 | 1498.06 |     1498.06 |           0
 2023-01-01 | 2993.95 |     1483.04 |      1510.91
 2022-12-01 | 1413.17 |      382.84 |      1030.33
 2022-11-01 | 1378.18 |      197.52 |      1180.66
 2022-10-01 | 1342.91 |      185.03 |      1157.88
 2022-09-01 | 1299.90 |       88.01 |      1211.89
 2022-08-01 | 1261.97 |       85.29 |      1176.68

Before we get started, run the following to turn off the expanded display. Expanded display makes reading individual records nicer, but tables show cleaner for aggregated results:

\x

Invoices Table

First, let’s look at the underlying data. Show the details of the invoices table:

\d invoices

Renders the output:


                                              Table "public.invoices"
       Column       |              Type              | Collation | Nullable |               Default
--------------------+--------------------------------+-----------+----------+--------------------------------------
 id                 | bigint                         |           | not null | nextval('invoices_id_seq'::regclass)
 account_id         | integer                        |           |          |
 net_total_in_cents | integer                        |           |          |
 invoice_period     | daterange                      |           |          |
 status             | text                           |           |          |
 created_at         | timestamp(6) without time zone |           | not null |
 updated_at         | timestamp(6) without time zone |           | not null |
Indexes:
    "invoices_pkey" PRIMARY KEY, btree (id)
    "index_invoices_on_account_id" btree (account_id)

We’ve made a few stylistic choices here for this invoices table.

1) The invoice_period is a date range. The lower value on the range is the start of the period, and the upper value is the end of the period. To extract the first value of this range, use lower(invoice_period).

2) We name the net_total_in_cents field because it contains cents instead of dollars, and we store it as an integer instead of a float. This prevents fractional cents.

Now, let’s look at a few records:

SELECT * FROM invoices LIMIT 10;

Which returns:

 id | account_id | net_total_in_cents |     invoice_period      | status |         created_at         |         updated_at         
----+------------+--------------------+-------------------------+--------+----------------------------+----------------------------
  1 |          4 |               1072 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.420197 | 2023-02-01 17:28:37.420197
  2 |          6 |                955 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.422361 | 2023-02-01 17:28:37.422361
  3 |          7 |                322 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.423726 | 2023-02-01 17:28:37.423726
  4 |          9 |                778 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.425423 | 2023-02-01 17:28:37.425423
  5 |         10 |                338 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.426731 | 2023-02-01 17:28:37.426731
  6 |         12 |                386 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.428382 | 2023-02-01 17:28:37.428382
  7 |         16 |                 66 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.430613 | 2023-02-01 17:28:37.430613
  8 |         18 |                283 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.432194 | 2023-02-01 17:28:37.432194
  9 |         25 |                508 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.43536  | 2023-02-01 17:28:37.43536
 10 |         26 |                746 | [2022-02-01,2022-02-28) | paid   | 2023-02-01 17:28:37.436662 | 2023-02-01 17:28:37.436662
(10 rows)

CTEs

Let’s start with an example that uses CTEs (Common Table Expressions) or sometimes referred to as WITH clauses.

At first, the following can look complex, but just think of it as 3 different queries aggregated into a single query. The total , collected, and invoiced queries find their respective values for each month, then the final query joins those values together based on the mnth value.

WITH totals AS ( SELECT lower(i.invoice_period) as mnth, SUM(i.net_total_in_cents) / 100.0 as net_total FROM invoices i GROUP BY 1 ), collected AS ( SELECT lower(i.invoice_period) as mnth, SUM(i.net_total_in_cents) / 100.0 as amount FROM invoices i WHERE status = 'paid' GROUP BY 1 ), invoiced AS ( SELECT lower(i.invoice_period) AS mnth, sum(i.net_total_in_cents) / 100.0 AS amount FROM invoices i WHERE status = 'invoiced' GROUP BY 1 ) SELECT totals.mnth, totals.net_total as billed, COALESCE(invoiced.amount, 0) as uncollected, COALESCE(collected.amount, 0) as collected FROM totals LEFT JOIN invoiced ON totals.mnth = invoiced.mnth LEFT JOIN collected on totals.mnth = collected.mnth ORDER BY 1 desc;

The output will show one record per month with the total amount, uncollected amount, and collected amount for the invoices. To experiment dissecting this SQL, you can pluck each of the WITH statements and run them individually to see how they respond.

Case statements for conditional filtering

Another option here is leveraging the CASE statement for filtering in how we want to aggregate. For this scenario, the query will be significantly shorter than the use of CTEs.

SELECT LOWER(i.invoice_period) as mnth, SUM(i.net_total_in_cents) / 100.0 as billed, SUM (CASE WHEN status = 'invoiced' THEN i.net_total_in_cents END) / 100.00 as uncollected, SUM (CASE WHEN status = 'paid' THEN i.net_total_in_cents END) / 100.00 as collected FROM invoices i GROUP BY 1 ORDER BY 1 desc;

Think of CASE as an IF/THEN statement that returns the net_total_in_cents for the SUM if a condition is met. These types of case statements are fairly common in data analysis. But there is another option: FILTER

Using FILTER

FILTER is functionally similar to the CASE statement, but makes the SQL a bit more readable:

SELECT LOWER(i.invoice_period) as mnth, SUM (i.net_total_in_cents) / 100.0 as billed, SUM (i.net_total_in_cents) FILTER (WHERE status = 'invoiced') / 100.00 as uncollected, SUM (i.net_total_in_cents) FILTER (WHERE status = 'paid') / 100.00 as collected FROM invoices i GROUP BY 1 ORDER BY 1 desc;

From the standpoint of making your Postgres syntax as easy to read and streamlined, `FILTER` is a good tool to keep in your pocket.

Loading terminal...

Loading terminal...