Latest posts from Paul Ramsey

  • Generate Unlimited Crypto Using Postgres!

    Paul Ramsey

    Ha ha, made you look! This post is not a crazy scam (you be the judge) but just a practical description of using cryptographical algorithms to encrypt and decrypt data inside PostgreSQL. There's already a lot of encryption in Crunchy Bridge ! First, your data are "encrypted at rest". That means that the "volumes" (what in an earlier era would be called the disk drives) your data is saved to are encrypted. Also all the backup files generated by your server are encrypted. In practice, this means...

    Read More
  • Rise of the Anti-Join

    Paul Ramsey

    Find me all the things in set "A" that are not in set "B". This is a pretty common query pattern, and it occurs in both non-spatial and spatial situations. As usual, there are multiple ways to express this query in SQL, but only a couple queries will result in the best possible performance. The non-spatial setup starts with two tables with the numbers 1 to 1,000,000 in them, then deletes two records from one of the tables. The spatial setup is a 2M record table of geographic names, and a 3K rec...

    Read More
  • Postgres Indexing: When Does BRIN Win?

    Paul Ramsey

    The PostgreSQL BRIN index is a specialized index for (as the documentation says) "handling very large tables in which certain columns have some natural correlation with their physical location within the table". For data of that sort, BRIN indexes offer extremely low insert costs (good for high velocity data) and extremely small index sizes (good for high volume data). But what data has this "natural correlation"? Most frequently, data with a timestamp that is continuously adding new rows. • A l...

    Read More
  • Choosing a PostgreSQL Number Format

    Paul Ramsey

    It should be the easiest thing in the world: you are modeling your data and you need a column for some numbers, what type do you use? PostgreSQL offers a lot of different number types, and they all have advantages and limitations. You want the number type that is going to: • Store your data using the smallest amount of space • Represent your data with the smallest amount of error • Manipulate your data using the correct logic Store your data using the smallest amount of space Represent your da...

    Read More
  • 6 min read

    Postgres' Clever Query Planning System

    Paul Ramsey

    The sheer cleverness of relational databases is often discounted because we so frequently use them for very simple data management tasks. Serialize an object into a row, store with unique key. yawwwn Search for unique key, deserialize row into an object. yawwwwwwn The real power of relational databases is juggling "relations" (aka tables) in large numbers and figuring out on-the-fly the most effective way to filter out rows and find an answer. PostgreSQL has an undeniably clever query plannin...

    Read More
  • 5 min read

    Instant Heatmap with pg_featureserv

    Paul Ramsey

    The pg_featureserv micro-service is a thin middleware that binds tables and functions in a PostgreSQL database to a JSON collections API, accessible over HTTP. Using the Crunchy Bridge container apps , I'm going to give a quick overview of how to set up a web based spatial heatmap from Postgres. The application uses PostgreSQL to store and search 2.2M geographic names in the USA. Type in the search box and the auto-fill form will find candidate words. Select a word, and the database will pe...

    Read More
  • 8 min read

    Parquet and Postgres in the Data Lake

    Paul Ramsey

    A couple weeks ago, I came across a blog from Retool on their experience migrating a 4TB database. They put in place some good procedures and managed a successful migration, but the whole experience was complicated by the size of the database. The size of the database was the result of a couple of very large "logging" tables: an edit log and an audit log. The thing about log tables is, they don't change much. They are append-only by design. They are also queried fairly irregularly, and the qu...

    Read More
  • 5 min read

    PostGIS vs GPU: Performance and Spatial Joins

    Paul Ramsey

    Every once in a while, a post shows up online about someone using GPUs for common spatial analysis tasks, and I get a burst of techno-enthusiasm. Maybe this is truly the new way! This week it was a post on GPU-assisted spatial joins that caught my eye. In summary, the author took a 9M record set of parking infractions data from Philadelphia and joined it to a 150 record set of Philadelphia neighborhoods . The process involved building up a little execution engine in Python. It was pretty ma...

    Read More
  • 5 min read

    Elevation Profiles and Flightlines with PostGIS

    Paul Ramsey

    A community member on the postgis-users mailing list had a question recently: I have a table of elevation points, and I would like to figure out an elevation profile for a flightline running through those points. How? This question is a nice showcase of some of my favorite spatial tools with indexing, point to point distance queries on a sphere, and nearest neighbor queries. I thought it would make a great post. The original question author was nice enough to share his elevation data, so I c...

    Read More
  • 4 min read

    PostGIS 3.2 New and Improved

    Paul Ramsey

    Last month, just under the wire for a 2021 release, the 3.2 version of PostGIS hit the streets! This new PostGIS also supports the latest 3.10 release of GEOS, which underpins a few of the new features. The extension uses the GDAL raster library for things like vectorizing rasters and rasterizing vectors (yes!). This release exposed a few more cool GDAL algorithms. • The new ST_InterpolateRaster function allows collections of measurement points to be interpolated into a continuous raster s...

    Read More