Postgres Raster Query Basics

Avatar for Paul Ramsey

Paul Ramsey

6 min read

In geospatial terminology, a "raster" is a cover of an area divided into a uniform gridding, with one or more values assigned to each grid cell.

rows and columns showing how a raster works with a pixel

A "raster" in which the values are associated with red, green and blue bands might be a visual image. The rasters that come off the Landsat 7 earth observation satellite have eight bands: red, green, blue, near infrared, shortwave infrared, thermal, mid-infrared and panchromatic.

Database Rasters

Working with raster data via SQL is a little counter-intuitive: rasters don't neatly fit the relational model the way vector geometries do. A table of parcels where one column is the geometry and the others are the owner name, address, and tax roll id makes sense. How should a raster fit into a table? As a row for every pixel? For every scan row? What other values should be associated with each row?

There is no clean relationship between "real world objects" and the database representation of a raster, because a raster has nothing to say about objects, it is just a collection of measurements.

We can squeeze rasters into the database, but doing so makes working with the data more complex. Before loading data, we need to enable PostGIS and the raster module.

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;

Loading Rasters

For this example, we will load raster data for a "digital elevation model" (DEM), a raster with just one band, the elevation at each pixel.

Using the SRTM Tile Grabber I downloaded one tile of old SRTM data. Then using the gdalinfo utility, read out the metadata about the file.

wget https://srtm.csi.cgiar.org/wp-content/uploads/files/srtm_5x5/TIFF/srtm_12_03.zip
unzip srtm_12_03.zip
gdalinfo srtm_12_03.tif

The metadata tells me two useful things for loading the data:

  • The coordinate system of the data is WGS 84.
  • The pixel size is 3 arc-seconds.
  • The pixel type is Int16, so two bytes per pixel.

Knowing that, I can build a raster2pgsql call to load the data into a raster table.

raster2pgsql \
    -I \                 # create a spatial index on the column
    -s 4326 \            # use 4326 (WGS 84) as the spatial reference for the raster
    -t 32x32 \           # tile the raster into 32 by 32 pixel tiles
    srtm_12_03.tif | \   # name of the raster
    psql dem             # target database connection

Once loaded the raster table looks like this on a map.

map of the pacific northwest of the united states with a topographic overlay

And it looks like this in the database.

Table "public.srtm_12_03"

 Column |  Type
--------+---------
 rid    | integer
 rast   | raster
Indexes:
    "srtm_12_03_pkey" PRIMARY KEY, btree (rid)
    "srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))

It's a pretty boring table! Just a bunch of binary raster tiles and a unique key for each.

-- 29768 rows
SELECT Count(*) FROM srtm_12_03;

Those binary raster tiles aren't just opaque blobs though, we can look inside them with the right functions. Here we get a summary of all the raster tiles in the table.

SELECT (ST_SummaryStatsAgg(rast, 1, true)).* FROM srtm_12_03;
count  | 28966088
sum    | 20431360140
mean   | 705.3544869434907
stddev | 561.252765463607
min    | -291
max    | 4371

Tiles, Tiles, Tiles

Remember when we loaded the data with raster2pgsql we specified a "tile size" of 32 by 32 pixels? This has a number of implications.

  • First, at 32x32, a tile of 2-byte Int16 data like our DEM will take up 2048 bytes. This is small enough to fit in the database page size, which means the data will not end up stored in a side table by the TOAST subsystem that handles large row values.
  • Second, our input file had 6000x6000 pixels, which is enough pixels to generate 35156 32x32 tiles. Our table only has 29768 rows, because raster2pgsql does not generate tiles when the contents are all "no data" pixels (as the DEM data is over the ocean).

The loaded data looks like this.

map of Port Angeles, Washington with topographic overlay

Notice how small each tile is. As a general rule, when working with raster data queries,

  • the first step will be to efficiently find the relevant tile(s);
  • the second step will be to use the tiles to find the answer you want.

Finding tiles efficiently means using spatial index, and the spatial index definition as we saw above is this:

"srtm_12_03_st_convexhull_idx" gist (st_convexhull(rast))

This is a functional index, which means in order to access it, we need to copy the functional part: st_convechull(rast) when forming our query.

The ST_ConvexHull(raster) converts a raster tile into a polygon defining the boundary of the tile. When querying raster tables, you will use this function a great deal to convert rasters into polygons suitable for querying a spatial index.

Point Query

The simplest raster query is to take a point, and find the value of the raster under that point.

Here is a point table with one point in it:

CREATE TABLE mappoint AS
SELECT ST_Point(-123.7273, 47.8467, 4326)::geometry(Point, 4326) AS geom,
       1 AS fid;

The nice thing about points is that they only hit one tile at a time. So we don't have to think too hard about what to do with our tile sets.

SELECT ST_Value(srtm.rast, pt.geom)
FROM srtm_12_03 srtm
JOIN mappoint pt
  ON ST_Intersects(pt.geom, ST_ConvexHull(srtm.rast))
 st_value
----------
     1627

Here we use the ST_ConvexHull(raster) function to get access to our spatial index on the raster table, and the ST_Intersects(geom, geom) function to test the condition.

The ST_Value(raster, geom) function reads the pixel value from the raster at the location of the point.

Polygon Query

Summarizing rasters under polygons is more involved than reading point values, because polygons will frequently overlap multiple tiles, so you have to think in terms of "sets of raster tiles" instead of "the raster" when building your query.

map of Port Angeles, Washington with topographic overlay and a grid showing the tiles

map of Port Angeles, Washington with

  • Then we will summarize the clipped tiles, to get the average elevation in our polygon.

The final complete query looks like this.

WITH clipped_tiles AS (
  SELECT ST_Clip(srtm.rast, ply.geom) AS rast, srtm.rid
  FROM srtm_12_03 srtm
  JOIN mappoly ply
    ON ST_Intersects(ply.geom, ST_ConvexHull(srtm.rast))
)
SELECT (ST_SummaryStatsAgg(rast, 1, true)).*
  FROM clipped_tiles;
count  | 362369
sum    | 388175193
mean   | 1071.2152336430545
stddev | 441.7982032761408
min    | 108
max    | 2374

Conclusion

Working with database rasters analytically can be challenging, particularly if you are used to thinking about them as single, unitary coverages. Remember to apply the basic rules of database rasters:

  • Find the chips you need to answer your query.
  • Make sure to use ST_ConvexHull(raster) to drive a spatial index filter.
  • Assemble the chips as needed to answer the query (you might need to use the ST_Union(raster) aggregate).
  • Carry out your actual raster query.