Paul Ramsey
Paul Ramsey
One of the temptations database users face, when presented with a huge table of interesting data, is to run queries that interrogate every record. Got a billion measurements? What’s the average of that?! One way to find out is to just calculate the average. For a billion records, that could take a while! Fortunately, the “Law of Large Numbers” is here to bail us out, stating that the average of a sample approaches the average of the population, as the sample size grows. And amazingly, the sample...
Read MorePaul Ramsey
Paul Ramsey
In this series , we talk about the many different ways you can speed up PostGIS. A common geospatial operation is to clip out a collection of smaller shapes that are contained within a larger shape. Today let's review the most efficient ways to query for things inside something else. Frequently the smaller shapes are clipped where they cross the boundary, using the ST_Intersection function. The naive SQL is a simple spatial join on ST_Intersects. When run on the small test area shown in the...
Read MorePaul Ramsey
Paul Ramsey
In the third installment of the PostGIS Performance series , I wanted to talk about performance around bounding boxes. Geometry data is different from most column types you find in a relational database. The objects in a geometry column can be wildly different in the amount of the data domain they cover, and the amount of physical size they take up on disk. The data in the “admin0” Natural Earth data range from the 1.2 hectare Vatican City, to the 1.6 billion hectare Russia, and from the 4 poin...
Read MorePaul Ramsey
Paul Ramsey
In this series , we talk about the many different ways you can speed up PostGIS. Today let’s talk about looking across the queries with pg_stat_statements and some basic tuning. A reasonable question to ask, if you are managing a system with variable performance is: “what queries on my system are running slowly?” Fortunately, PostgreSQL includes an extension called “pg_stat_statements” that tracks query performance over time and maintains a list of high cost queries. Now you will have to leave...
Read MorePaul Ramsey
Paul Ramsey
I am kicking off a short blog series on PostGIS performance fundamentals. For this first example, we will cover fundamental indexing. We will explore performance using the Natural Earth “admin0” (countries) data (258 polygons) and their “populated places” (7342 points). A classic spatial query is the “spatial join”, finding the relationships between objects using a spatial contain. “How many populated places are there within each country?” This returns an answer, but it takes 2200 milliseco...
Read More