Postgres Raster Query Basics
In geospatial terminology, a "raster" is a cover of an area divided into a uniform gridding, with one or more values assigned to each grid cell.
A "raster" in which the values are associated with red, green and blue bands might be a visual image. The rasters that come off the Landsat 7 earth observation satellite have eight bands: red, green, blue, near infrared, shortwave infrared, thermal, mid-infrared and panchromatic.
Working with raster data via SQL is a little counter-intuitive: rasters don't neatly fit the relational model the way vector geometries do. A table of parcels where one column is the geometry and the others are the owner name, address, and tax roll id makes sense. How should a raster fit into a table? As a row for every pixel? For every scan row? What other values should be associated with each row?
There is no clean relationship between "real world objects" and the database representation of a raster, because a raster has nothing to say about objects, it is just a collection of measurements.
We can squeeze rasters into the database, but doing so makes working with the data more complex. Before loading data, we need to enable PostGIS and the raster module.
CREATE EXTENSION postgis; CREATE EXTENSION postgis_raster;
For this example, we will load raster data for a "digital elevation model" (DEM), a raster with just one band, the elevation at each pixel.
Using the SRTM Tile Grabber I downloaded one tile of
old SRTM data. Then using the
utility, read out the metadata about the file.
wget http://srtm.csi.cgiar.org/wp-content/uploads/files/srtm_5x5/TIFF/srtm_12_03.zip unzip srtm_12_03.zip gdalinfo srtm_12_03.tif
The metadata tells me two useful things for loading the data:
- The coordinate system of the data is WGS 84.
- The pixel size is 3 arc-seconds.
- The pixel type is
Int16, so two bytes per pixel.
Knowing that, I can build a
raster2pgsql call to load the data into a raster
raster2pgsql \ -I \ # create a spatial index on the column -s 4326 \ # use 4326 (WGS 84) as the spatial reference for the raster -t 32x32 \ # tile the raster into 32 by 32 pixel tiles srtm_12_03.tif | \ # name of the raster psql dem # target database connection
Once loaded the raster table looks like this on a map.
And it looks like this in the database.
Table "public.srtm_12_03" Column | Type --------+--------- rid | integer rast | raster Indexes: "srtm_12_03_pkey" PRIMARY KEY, btree (rid) "srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))
It's a pretty boring table! Just a bunch of binary raster tiles and a unique key for each.
-- 29768 rows SELECT Count(*) FROM srtm_12_03;
Those binary raster tiles aren't just opaque blobs though, we can look inside them with the right functions. Here we get a summary of all the raster tiles in the table.
SELECT (ST_SummaryStatsAgg(rast, 1, true)).* FROM srtm_12_03;
count | 28966088 sum | 20431360140 mean | 705.3544869434907 stddev | 561.252765463607 min | -291 max | 4371
Tiles, Tiles, Tiles
Remember when we loaded the data with
raster2pgsql we specified a "tile size"
of 32 by 32 pixels? This has a number of implications.
- First, at 32x32, a tile of 2-byte
Int16data like our DEM will take up 2048 bytes. This is small enough to fit in the database page size, which means the data will not end up stored in a side table by the TOAST subsystem that handles large row values.
- Second, our input file had 6000x6000 pixels, which is enough pixels to
generate 35156 32x32 tiles. Our table only has 29768 rows, because
raster2pgsqldoes not generate tiles when the contents are all "no data" pixels (as the DEM data is over the ocean).
The loaded data looks like this.
Notice how small each tile is. As a general rule, when working with raster data queries,
- the first step will be to efficiently find the relevant tile(s);
- the second step will be to use the tiles to find the answer you want.
Finding tiles efficiently means using spatial index, and the spatial index definition as we saw above is this:
"srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))
This is a
which means in order to access it, we need to copy the functional part:
st_convechull(rast) when forming our query.
The ST_ConvexHull(raster) converts a raster tile into a polygon defining the boundary of the tile. When querying raster tables, you will use this function a great deal to convert rasters into polygons suitable for querying a spatial index.
The simplest raster query is to take a point, and find the value of the raster under that point.
Here is a point table with one point in it:
CREATE TABLE mappoint AS SELECT ST_Point(-123.7273, 47.8467, 4326)::geometry(Point, 4326) AS geom, 1 AS fid;
The nice thing about points is that they only hit one tile at a time. So we don't have to think too hard about what to do with our tile sets.
SELECT ST_Value(srtm.rast, pt.geom) FROM srtm_12_03 srtm JOIN mappoint pt ON ST_Intersects(pt.geom, ST_ConvexHull(srtm.rast))
st_value ---------- 1627
Here we use the ST_ConvexHull(raster) function to get access to our spatial index on the raster table, and the ST_Intersects(geom, geom) function to test the condition.
The ST_Value(raster, geom) function reads the pixel value from the raster at the location of the point.
Summarizing rasters under polygons is more involved than reading point values, because polygons will frequently overlap multiple tiles, so you have to think in terms of "sets of raster tiles" instead of "the raster" when building your query.
- We will use ST_ConvexHull(raster) and ST_Intersects(geom, geom) again to get the set of tiles that intersect the polygon.
- Then we will apply ST_Clip(raster, geometry) to mask out pixels on our tiles that are not inside the polygon.
- Then we will summarize the clipped tiles, to get the average elevation in our polygon.
The final complete query looks like this.
WITH clipped_tiles AS ( SELECT ST_Clip(srtm.rast, ply.geom) AS rast, srtm.rid FROM srtm_12_03 srtm JOIN mappoly ply ON ST_Intersects(ply.geom, ST_ConvexHull(srtm.rast)) ) SELECT (ST_SummaryStatsAgg(rast, 1, true)).* FROM clipped_tiles;
count | 362369 sum | 388175193 mean | 1071.2152336430545 stddev | 441.7982032761408 min | 108 max | 2374
Working with database rasters analytically can be challenging, particularly if you are used to thinking about them as single, unitary coverages. Remember to apply the basic rules of database rasters:
- Find the chips you need to answer your query.
- Make sure to use ST_ConvexHull(raster) to drive a spatial index filter.
- Assemble the chips as needed to answer the query (you might need to use the ST_Union(raster) aggregate).
- Carry out your actual raster query.
February 23, 2023 •More by this author