I'll admit it: I used to have triskaidekaphobia, otherwise known as a "fear or avoidance of the number 13." After all, I grew up in an area with many tall buildings where the elevators have no 13th floor, so it seems like I could rationalize this irrationality. I've been able to push aside this phobia, but I can completely empathize with those who don't like dealing with 13.
Which comes to our topic today: the upcoming PostgreSQL 13 release.
The fear of the number 13 was enough to prompt discussions on the upcoming version number, which given a lot of factors (including the relatively recent number scheme change) the community opted to stick to the status quo.
But the aim of this post is not dive into the psychology behind the number thirteen, but rather go into why Postgres 13 is a special, if not lucky, release.
Similar to what I wrote last year, PostgreSQL 13 is not a "This is the FEATURE release" that we've had in previous years (Partitioning! Logical Replication! Parallelism! Upsert! JSON! etc.), but rather improves upon many existing features based on real-world PostgreSQL experiences. In fact, I'd call PostgreSQL 13 the "something for everyone" release: there is very likely a feature that you will immediate benefit from by upgrading to PostgreSQL 13.
So while I encourage you to check out the Postgres 13 release notes (which at the time of this writing is still a draft), let me discuss some of the features that make this release very lucky!
The B-tree index is the most commonly used index in PostgreSQL: it is the type of index you create when running a "CREATE INDEX" command. As a general rule of thumb, smaller B-tree indexes are a good thing: they take up less space on disk and they are typically faster to search over. Both of these attributes have both performance and cost savings benefits.
One of the notable features of PostgreSQL 13 is an enhancement to B-tree indexes that more efficiently store duplicates entries. In other words, if you have a B-tree index that contains repeat values, you can take advantage of this feature in Postgres 13 to have your index take up less space on disk!
Let's take a look at how this works. Let's create a somewhat contrived data set using the following SQL:
CREATE TABLE abc (x int PRIMARY KEY, y int); CREATE INDEX ON abc (y); INSERT INTO abc SELECT x, x % 16378 FROM generate_series(1,1000000) x;
I ran this SQL in both a Postgres 12 and 13 instance. I was able to get the size of the indexes using the following query:
with the following results:
- Postgres 12: 28MB
- Postgres 13: 9.5MB
That's pretty neat: the index is certainly smaller in PostgreSQL 13! How about performance?
I'll say this: query performance and benchmarking is a very deep topic, so I will try to give some directional guidance. I ran the following query on both instances and averaged out the times:
EXPLAIN ANALYZE SELECT count(*) FROM abc WHERE y = 50;
Directionally, the query ran 2x faster on Postgres 13, though I will give the caveat of YMMV (your mileage may vary).
If you're upgrading from a previous version of PostgreSQL, you will have to run a REINDEX on your existing B-tree indexes to take advantage of the deduplication functionality...but hey, PostgreSQL 12 introduced REINDEX CONCURRENTLY so you don't have to take a downtime hit!
...or so I would think. Our apartment is tiny enough that I've never tried to parallelize our vacuuming process, but in theory more vacuums would help.
The good news is that PostgreSQL does have its very own VACUUM, and this vacuum is an essential piece of PostgreSQL maintenance. Vacuuming is part of PostgreSQL's muliversion concurrency control system and, in short, cleans up rows that are no longer visible, e.g. because they were updated or deleted. Also, use autovacuum.
As your dataset gets larger, vacuuming takes a longer time to complete, which can be far from ideal. But fortune smiles upon you in PostgreSQL 13, as it introduces the ability to vacuum indexes in parallel.
This new feature works by default in PostgreSQL 13: when you run a VACUUM on a table, any indexes will be processed in parallel. If you want, you can also specify the PARALLEL directive to control how many parallel VACUUM jobs you want running on your indexes.
The benefit? You can clean your databases more quickly, ensuring you can maintain a healthy Postgres system!
In an algorithms class, it's typical to look at the pain of sorting from a performance standpoint. If you have data that's already sorted and you need to use it later on as part of a different sort, it helps to have it presorted!
Enter incremental sorting.
In Postgres 13, sorted data from earlier in a query can be used to speed up a later part of the query. This can apply to a lot of common queries, such as sorting over multiple columns. Let's see this by example.
Using the table above, first ensure that we have dropped the index on column "y" and we have run an analyze:
DROP INDEX abc_y_idx; ANALYZE abc;
Let's run the following query on both Postgres 12 and Postgres 13:
EXPLAIN ANALYZE SELECT * FROM abc WHERE x % 13 = 0 ORDER BY x, y DESC LIMIT 13;
On Postgres 12, you may see output similar to this:
Limit (cost=11723.98..11725.50 rows=13 width=8) (actual time=37.035..39.227 rows=13 loops=1) -> Gather Merge (cost=11723.98..12210.05 rows=4166 width=8) (actual time=37.035..39.225 rows=13 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=10723.96..10729.16 rows=2083 width=8) (actual time=34.181..34.182 rows=13 loops=3) Sort Key: x, y DESC Sort Method: top-N heapsort Memory: 25kB Worker 0: Sort Method: top-N heapsort Memory: 25kB Worker 1: Sort Method: top-N heapsort Memory: 25kB -> Parallel Seq Scan on abc (cost=0.00..10675.00 rows=2083 width=8) (actual time=0.021..31.731 rows=25641 loops=3) Filter: ((x % 13) = 0) Rows Removed by Filter: 307692 Planning Time: 0.044 ms Execution Time: 39.242 ms
whereas on Postgres 13:
Limit (cost=7.52..100.14 rows=13 width=8) (actual time=0.059..0.061 rows=13 loops=1) -> Incremental Sort (cost=7.52..35633.43 rows=5000 width=8) (actual time=0.058..0.059 rows=13 loops=1) Sort Key: x, y DESC Presorted Key: x Full-sort Groups: 1 Sort Method: quicksort Average Memory: 25kB Peak Memory: 25kB -> Index Scan using abc_pkey on abc (cost=0.42..35408.43 rows=5000 width=8) (actual time =0.022..0.050 rows=14 loops=1) Filter: ((x % 13) = 0) Rows Removed by Filter: 168 Planning Time: 0.114 ms Execution Time: 0.082 ms
Again, with the caveat that the execution times are directional, we can see a fairly significant speedup between the queries on the different PostgreSQL versions. The new incremental sorting feature in Postgres 13 allows you to leverage the index for the first part of the ordering, which provides a performance boost.
Incremental sort in PostgreSQL 13 is one of those features that could help improve the performance of your data workloads just by upgrading, if you happen to run queries like the above.
There are many, many more features in PostgreSQL 13, so much so that this blog post could end up turning into a tome. A few highlights that I know could apply to past workloads that I managed (and in no particular order):
- The advanced query planner statistics available in PostgreSQL via CREATE STATISTICS can help with plans with "IN/ANY" operations as well as "OR" clauses. I definitely used these a lot in my previous life, so very excited for this feature!
- Many improvements to partitioning, including query performance for joins across partitioned tables, full support for logical replication, and the addition of "BEFORE" triggers for row-level operations.
- Hash aggregation, used with aggregate functions and grouping sets, can be used with a greater number of queries: if the hash aggregate is too large to fit all into memory, it will now spill over to disk. More performance for a greater set of queries!
- Connection drivers that use libpq can now require channel binding when performing SCRAM authentication. Oh, and upgrade your passwords to SCRAM.
- A lot of functional conveniences: there is now a "datetime" function in the JSON path query language, and "gen_random_uuid" can now be used without an extension!
- The PostgreSQL foreign data wrapper, postgres_fdw, used to connect to remote Postgres databases, now supports certificate authentication.
and many more. I really do encourage you to check out the release notes.
So far, the PostgreSQL 13 release schedule has proceeded on time. It still appears that Postgres 13 will be made generally available at its expected time late in the third quarter. Given the unprecedented challenges that many in the PostgreSQL Global Development Group faced this year, let alone everyone, it is quite amazing that the community has pulled together to help ensure another successful PostgreSQL release.
It's seeing things like the above that helped alleviate my triskaidekaphobia, and all the more reason I'm convinced that Postgres 13 is special: in these unordinary times, the PostgreSQL community has again proven its ability to produce a reliable database system that continues to make it even easier to build data-driven applications.
So take a chance on Postgres 13!
Jonathan S. Katz
August 25, 2020 •More by this author