Latest posts from Paul Ramsey

  • 8 min read

    Parquet and Postgres in the Data Lake

    Paul Ramsey

    A couple weeks ago, I came across a blog from Retool on their experience migrating a 4TB database. They put in place some good procedures and managed a successful migration, but the whole experience was complicated by the size of the database. The size of the database was the result of a couple of very large "logging" tables: an edit log and an audit log. The thing about log tables is, they don't change much. They are append-only by design. They are also queried fairly irregularly, and the qu...

    Read More
  • 5 min read

    PostGIS vs GPU: Performance and Spatial Joins

    Paul Ramsey

    Every once in a while, a post shows up online about someone using GPUs for common spatial analysis tasks, and I get a burst of techno-enthusiasm. Maybe this is truly the new way! This week it was a post on GPU-assisted spatial joins that caught my eye. In summary, the author took a 9M record set of parking infractions data from Philadelphia and joined it to a 150 record set of Philadelphia neighborhoods . The process involved building up a little execution engine in Python. It was pretty ma...

    Read More
  • 5 min read

    Elevation Profiles and Flightlines with PostGIS

    Paul Ramsey

    A community member on the postgis-users mailing list had a question recently: I have a table of elevation points, and I would like to figure out an elevation profile for a flightline running through those points. How? This question is a nice showcase of some of my favorite spatial tools with indexing, point to point distance queries on a sphere, and nearest neighbor queries. I thought it would make a great post. The original question author was nice enough to share his elevation data, so I c...

    Read More
  • 4 min read

    PostGIS 3.2 New and Improved

    Paul Ramsey

    Last month, just under the wire for a 2021 release, the 3.2 version of PostGIS hit the streets! This new PostGIS also supports the latest 3.10 release of GEOS, which underpins a few of the new features. The extension uses the GDAL raster library for things like vectorizing rasters and rasterizing vectors (yes!). This release exposed a few more cool GDAL algorithms. • The new ST_InterpolateRaster function allows collections of measurement points to be interpolated into a continuous raster s...

    Read More
  • 5 min read

    Extracting and Substituting Text with Regular Expressions in PostgreSQL

    Paul Ramsey

    While supporting Crunchy Spatial and Crunchy Bridge clients, I’ve been thinking about how I usually clean messy data. I wanted to talk about regular expressions ( regex ) and Postgres. Regular expressions get a bad rap. They're impossible to read, they're inconsistently implemented in different platforms, they can be slow to execute. All of these things may be true, and yet: if you don't know regular expressions yet, you are missing a key skill for data manipulation that you will use throu...

    Read More
  • 3 min read

    Tricks for Faster Spatial Indexes

    Paul Ramsey

    One of the curious aspects of spatial indexes is that the nodes of the tree can overlap, because the objects being indexed themselves also overlap. That means that if you're searching an area in which two nodes overlap, you'll have to scan the contents of both nodes. For a trivial example above, that's not a big deal, but if an index has a lot of overlap, the extra work can add up to a measurable query time difference. The PostGIS spatial index is based on a R-tree structure, which naturally t...

    Read More
  • 3 min read

    Fast, Flexible Summaries with Aggregate Filters and Windows

    Paul Ramsey

    PostgreSQL can provide high performance summaries over multi-million record tables, and supports some great SQL sugar to make it concise and readable, in particular aggregate filtering, a feature unique to PostgreSQL and SQLite . A huge amount of reporting is about generating percentages: for a particular condition, what is a value relative to a baseline. Here's a quick "sales table" with three categories ("a" and "b" and "c") and one million random values between 0 and 10: In the bad-old-days,...

    Read More
  • 6 min read

    Waiting for PostGIS 3.2: Secure Cloud Raster Access

    Paul Ramsey

    Raster data access from the spatial database is an important feature, and the coming release of PostGIS will make remote access more practical by allowing access to private cloud storage. Previous versions could access rasters in public buckets, which is fine for writing blog posts , but in the real world people frequently store their data in private buckets, so we clearly needed the ability to add security tokens to our raster access. Putting rasters in a database is not necessarily a good ide...

    Read More
  • 6 min read

    Waiting for PostGIS 3.2: ST_Contour and ST_SetZ

    Paul Ramsey

    One theme of the 3.2 release is new analytical functionality in the raster module, and access to cloud-based rasters via the "out-db" option for rasters. Let's explore two new functions and exercise cloud raster support at the same time. Can you believe that there is a complete raster data set of all SRTM elevation data online, in cloud optimized GeoTIFF format? It's true, there is (and much more), at OpenTopography ! The SRTM data set is a collection of 14380 files, with a pixel size o...

    Read More
  • 4 min read

    Waiting for PostGIS 3.2: ST_MakeValid

    Paul Ramsey

    One of the less visible improvements coming in PostGIS 3.2 (via the GEOS 3.10 release) is a new algorithm for repairing invalid polygons and multipolygons. Algorithms like polygon intersection, union and difference rely on guarantees that the structure of inputs follows certain rules. We call geometries that follow those rules "valid" and those that do not "invalid". The rules are things like: • Polygon rings should not cross themselves • Polygon rings should not cross other rings • Multipol...

    Read More