Waiting for PostGIS 3: Hilbert Geometry Sorting
With the release of PostGIS 3.0, queries that ORDER BY
geometry columns will return rows using a Hilbert curve ordering, and do so about twice as fast.
Whuuuut!?!
The history of "ordering by geometry" in PostGIS is mostly pretty bad. Up until version 2.4 (2017), if you did ORDER BY
on a geometry column, your rows would be returned using the ordering of the minimum X coordinate value in the geometry.
One of the things users expect of "ordering" is that items that are "close" to each other in the ordered list should also be "close" to each other in the domain of the values. For example, in a set of sales orders ordered by price, rows next to each other have similar prices.
To visualize what geometry ordering looks like, I started with a field of 10,000 random points.
CREATE SEQUENCE points_seq;
CREATE TABLE points AS
SELECT (ST_Dump
( ST_GeneratePoints(
ST_MakeEnvelope(-10000,-10000,10000,10000,3857),
10000)
)).geom AS geom,
nextval('points_seq') AS pk;
Now select from the table, and use ST_MakeLine() to join them up in the desired order. Here's the original ordering, prior to version 2.4.
SELECT ST_MakeLine(geom ORDER BY ST_X(geom)) AS geom
FROM points;
Each point in the ordering is close in the X coordinate, but since the Y coordinate can be anything, there's not much spatial coherence to the ordered set. A better spatial ordering will keep points that are near in space also near in the set.
For 2.4 we got a little clever. Instead of sorting by the minimum X coordinate, we took the center of the geometry bounds, and created a Morton curve out of the X and Y coordinates. Morton keys just involve interleaving the bits of the values on the two axes, which is a relatively cheap operation.
The result is way more spatially coherent.
For 3.0, we have replaced the Morton curve with a Hilbert curve. The Hilbert curve is slightly more expensive to calculate, but we offset that by stripping out some wasted cycles in other parts of the old implementation, and the new code is now faster, and also even more spatially coherent.
SELECT ST_MakeLine(geom ORDER BY geom) AS geom
FROM points;
PostGIS 3.0 will be released some time this fall, hopefully before the initial release of PostgreSQL 12.
Related Articles
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read