Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Convert JSON into Columns and Rows with JSON_TABLE

Avatar for Paul Ramsey

Paul Ramsey

5 min read

JSON_TABLE, new in Postgres 17

If you missed some of the headlines and release notes, Postgres 17 added another huge JSON feature to its growing repository of strong JSON support with the JSON_TABLE feature. JSON_TABLE lets you query JSON and display and query data like it is native relational SQL. So you can easily take JSON data feeds and work with it like you would any other Postgres data in your database.

Shaking the Earth with JSON_TABLE

A few days ago, I was awakened in the middle of the night when my house started to shake. Living in the Cascadia subduction zone, when things start to shake I wake up really fast, because you never know if this one is going to be the Big One.

Fortunately this one was only a little one, 4.0 magnitude quake several miles to the north of the city, captured and memorialized in its own USGS earthquake page, almost as soon as it had finished shaking.

The USGS keeps a near-real-time collection of information about the latest quakes online, served up in a variety of formats including GeoJSON.

The weekly feed of magnitude 4.5 quakes has a nice amount of data in it.

If we could import this feed into the database, we could use it for other queries, like finding potential customers to sell tents and emergency supplies to! (When the big one hits me, sell me some tents and emergency supplies.)

This readily available GeoJSON earthquake file seems like the perfect chance to try out the new JSON_TABLE. And maybe give me something to do in the middle of night.

Retrieving a JSON file with HTTP

The first step is to retrieve the feed. The simplest way is to use the http extension, which provides a simple functional API to making HTTP requests.

CREATE EXTENSION http;

The http_get(url) function returns an http_response, with a status code, content_type, headers and content. We could write a wrapper to check the status code, but for this example we will just assume the feed works and look at the content.

SELECT jsonb_pretty(content::jsonb)
  FROM http_get('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_week.geojson');

Reading JSON Features with Postgres JSON_TABLE

The feed is actually a GeoJSON FeatureCollection, which is just a container for a list of Features. In order to convert it to a table, we need to iterate through the list.

JSON_TABLE is part of the SQL/JSON standard, and allows users to filter parts of JSON documents using the JSONPath filter language.

json to tables

We can use JSON_TABLE to take specific fields from the JSON structure (using JSONPath expressions) to map them to corresponding SQL columns:


-- Download the GeoJSON feed from USGS
WITH http AS (
    SELECT * FROM
    http_get('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_week.geojson')
),
-- Filter the JSON feed into a record set, providing
-- the type informant and JSONPath to each column
jt AS (
    SELECT * FROM
    http,
    JSON_TABLE(content, '$.features[*]' COLUMNS (
        title text PATH '$.properties.title',
        mag real PATH '$.properties.mag',
        place text PATH '$.properties.place',
        ts text PATH '$.properties.time',
        url text PATH '$.properties.url',
        detail text PATH '$.properties.detail',
        felt integer PATH '$.properties.felt',
        id text PATH '$.id',
        geometry jsonb PATH '$.geometry'
   )))
SELECT * FROM jt
;
  • The first argument is the JSON document get via http
  • The second argument is the filter that generates the rows, in this case one row for each member of the features list in the GeoJSON FeatureCollection
  • The COLUMNS provides a path, within each Feature to pull the column data from, and the database type to apply to that data. Most of the columns come from the GeoJSON properties but others, like the id and geometry come from other attributes.

Transforms on JSON

Once we’re reading this json as sql, we might want to do a few more things like convert timestamps into our standard format, transform the geometry column, and add a srid. So here’s a query building on the above that does that too. Note that you'll need PostGIS for these.

-- Download the GeoJSON feed from USGS
WITH http AS (
    SELECT * FROM
    http_get('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_week.geojson')
),
-- Filter the JSON feed into a record set, providing
-- the type informant and JSONPath to each column
jt AS (
    SELECT * FROM
    http,
    JSON_TABLE(content, '$.features[*]' COLUMNS (
        title text PATH '$.properties.title',
        mag real PATH '$.properties.mag',
        place text PATH '$.properties.place',
        ts text PATH '$.properties.time',
        url text PATH '$.properties.url',
        detail text PATH '$.properties.detail',
        felt integer PATH '$.properties.felt',
        id text PATH '$.id',
        geometry jsonb PATH '$.geometry'
    ))
)
-- Apply any remaining transforms to the columns
-- in this case converting the epoch time into a timestamp
-- and the GeoJSON into a geometry
SELECT
    jt.title,
    jt.mag,
    jt.place,
    to_timestamp(jt.ts::bigint / 1000),
    jt.url,
    jt.detail,
    jt.felt,
    jt.id,
    ST_SetSRID(ST_GeomFromGeoJSON(jt.geometry),4326) AS geom
FROM jt;

Conclusion

Reading data from JSON files in the past might have involved writing functions in PL/PgSQL and building a complicated loop to iterate through each feature, creating relational rows of data. With JSON_TABLE:

  • You can read JSON from a URL or other source
  • Extract specific fields (in this case magnitude, location, and time)
  • Using standard Postgres functions, convert any data into a usable format (in this example PostGIS geometry, timezone conversions).

Now we have JSON data in SQL format and we can easily do further analysis or visualization.

The JSON_TABLE documentation includes some much more complicated examples, but this basic example of JSON document handling is probably good for 80% of use cases, pulling data from web APIs, live into the database.