JSON Updates in Postgres 16

Postgres has been steadily building on the JSON functionality initially released more than 10 years ago. With Postgres 16, working with JSON has gotten a couple nice improvements. Primarily, this release added features that ease the manipulation of data into JSON and improve the standard SQL functionality using JSON.

TL;DR:

  • A SQL/JSON data-type check. For instance, this lets you ask with SQL if something value IS JSON OBJECT
  • Addition of SQL-standard JSON functions: JSON_ARRAY(), JSON_ARRAYAGG(), JSON_OBJECT(), and JSON_OBJECTAGG()

Data for this post

For a quick tutorial, use the following dataset:

CREATE TABLE user_profiles ( id serial PRIMARY KEY, name text NOT NULL, details jsonb );
INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'John Doe', '{"age": 30, "address": {"city": "New York", "state": "NY"}, "hobbies": ["reading", "hiking"]}'::jsonb
   )
,
   (
      'Jane Smith', '{"age": 25, "address": {"city": "Los Angeles", "state": "CA"}, "hobbies": ["painting", "cooking"]}'::jsonb
   )
,
   (
      'Emily Davis', '{"age": 29, "address": {"city": "Chicago", "state": "IL"}, "hobbies": ["dancing", "writing"]}'::jsonb
   )
;

IS JSON

Previously, to test if a field was JSON, your options were to run pg_typeof:

SELECT
   details,
   pg_typeof(details),
   pg_typeof(details ->> 'address')
FROM
   user_profiles LIMIT 1;

Which would return jsonb for that second column, and text for the 3rd column. The problem with this is that it didn’t give you any inspection into the nested values within JSON, so running pg_typeof(details->>'address') would just tell you it is text. Now, we can do something like:

SELECT
   details,
   details IS JSON,
   details ->> 'address' IS JSON OBJECT
FROM
   user_profiles LIMIT 1;

Using this new functionality, the IS JSON returns true and the IS JSON OBJECT returns true as well. Previously, when building queries to inspect JSON, you were at risk of failed queries if the data’s JSON format did not match expected values. Imagine you are migrating your JSON structure, or you previously didn’t have a well defined schema, so let’s throw in some bad details data that mismatches our current structure:

INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'Bruce Wayne', '"Gotham City, NY"'
   )
;
INSERT INTO
   user_profiles (name, details)
VALUES
   (
      'Clark J. Kent', '{"age": 32, "address": [{"city": "New York", "state": "AL"}]}'
   )
;

Both of the values provided above are valid JSON, yet the existing structure was of the format {"addresses": []}. Now, when extracting the first address for a field, it would look like this:

SELECT
   CASE
      WHEN
         details -> 'address' IS JSON ARRAY
      THEN
(details -> 'address')[0]
      WHEN
         details -> 'address' IS JSON OBJECT
      THEN
         details -> 'address'
      WHEN
         details IS JSON SCALAR
      THEN
         details
   END
   AS primary_address
FROM
   user_profiles;

Additionally, because JSON is loosely validated, you can inspect UNIQUE keys or WITHOUT UNIQUE keys:

SELECT
	'{"batman": "robin", "batman": "robin"}' IS JSON WITH UNIQUE KEYS,
	'{"batman": "robin", "batman": "robin"}' IS JSON WITHOUT UNIQUE KEYS;

JSON_ARRAY & JSON_ARRAYAGG

Now, we can interact with JSON in a more-standards compliant way. The json and jsonb constructs are a Postgres implementation, but not a SQL implementation. Thus, the existing Postgres functions were prefixed with jsonb_ or json. These new functions get away from those prefixes, and use keywords to specify output.

Let’s look at json_array. If you have previously done this, you used json_build_array / jsonb_build_array, or the aggregate functions json_agg / jsonb_agg. Now, you can combine values using json_array:

SELECT
   json_array(name, details ->> 'age')
FROM
   user_profiles;

Or, use it to build an array on a GROUP BY using json_arrayagg. Below, we write a SQL query that returns a JSON array of each person within each age group.

SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
   json_arrayagg(name)
FROM
   user_profiles
GROUP BY
   1;

We can also use json_arrayagg in window functions. Below, we use a window function to find the other group members for each age group.

SELECT
   name,
   (
(details ->> 'age')::integer / 10
   )
   * 10 AS age_group,
   json_arrayagg(name) OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
   user_profiles;

Disclaimer: I would write this differently using a dedicated CTE and GROUP BY, but this is an example of using json_arrayagg as a window function using the small dataset.

JSON_OBJECTAGG

Just as we can aggregate values into an array, we can also aggregate values into an object. Previously, when using json(b)?_build_object or json(b)?_object_agg to construct objects. With Postgres 16, we have a standard SQL function called json_object with a syntax that moved away from tuples, and uses either value or : to separate key-value pairs.

SELECT
   json_object('name' value name, 'age': details ->> 'age')
FROM
   user_profiles;

The aggregations form is as follows:

SELECT
((details ->> 'age')::integer / 10) * 10 AS age_group,
   json_objectagg(name value details ->> 'age')
FROM
   user_profiles
GROUP BY
   1;

As with the json_arrayagg, the json_objectagg can be used in window functions:

SELECT
   name,
   (
(details ->> 'age')::integer / 10
   )
   * 10 AS age_group,
   json_objectagg(name value details ->> 'age') OVER (PARTITION BY (details ->> 'age')::integer / 10 * 10) AS other_group_members
FROM
   user_profiles;

Modifying behavior with keywords

For both json_array and json_object, they come with modifying keywords. json_array gets ABSENT and RETURNING. json_object gets ABSENT, UNIQUE, and RETURNING.

UNIQUE

Unique constraint on json_object will throw errors on duplicate keys, but by default, it will not error. Try the following:

SELECT
   json_object('key_1' value 'value_1', 'key_1' value 'value_2' WITH UNIQUE);
SELECT
   json_object('key_1' value 'value_1', 'key_1' value 'value_2');

ABSENT ON NULL v. NULL ON NULL

Absent, or more precisely ABSENT ON NULL constraint instructs json_object and json_array to omit null values:

SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' ABSENT
   ON NULL);
SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' NULL
   ON NULL);

Absent also works on json_array:

SELECT
   json_array('value_1', NULL, 'value_3' ABSENT
   ON NULL);
SELECT
   json_array('value_1', NULL, 'value_3' NULL
   ON NULL);

The opposing functionality of ABSENT ON NULL is the NULL ON NULL as above. This means the following is a valid SQL statement in Postgres 16:

SELECT
   json_array(NULL NULL
   ON NULL);

RETURNING

Have you wondered why we didn’t have a jsonb variant of this json_object function? It’s because the output is dictated by a keyword instead of the function name. Using RETURNING to specify the returned object type:

SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING jsonb);
SELECT
   json_object('key_1' value 'value_1', 'key_2' value NULL, 'key_3' value 'value_3' RETURNING json);
SELECT
   json_array('value_1', NULL, 'value_3' RETURNING jsonb);
SELECT
   json_array('value_1', NULL, 'value_3' RETURNING json);

Using the pg_typeof function, you can see the different data types that are returned by each statement above.

SELECT
   pg_typeof(json_array('value_1', NULL, 'value_3' RETURNING jsonb));

Below is an example of using all 3 of the keywords together:

SELECT json_object(
	'key_1' value 'value_1',
	'key_2' value NULL,
	'key_3' value 'value_3'
	ABSENT ON NULL WITH UNIQUE RETURNING jsonb);

This functionality is added to the existing functions as suffixes. The function json_object_agg received a strict, unique, and a unique_strict suffix as follows:

  • json_object_agg_strict / jsonb_object_agg_strict
  • json_object_agg_unique / jsonb_object_agg_unique
  • json_object_agg_unique_strict / jsonb_object_agg_unique_strict

For instance, using jsonb_object_agg_unique_strict would look like the following:

SELECT
	json_object_agg_unique_strict(key, value)
FROM (VALUES
		('key_1', 'value_1'),
		('key_2', NULL),
		('key_3', 'value_3')
	) AS t(key, value);

The unique suffix specifies to throw an error on duplicate keys, and strict suffix skips null values. Of course, combining unique_strict suffixes specifies both. Given that I appreciate Postgres’ pursuit of the SQL standards, I would lean toward using new JSON_OBJECT and JSON_ARRAY functions + their keywords.

Postgres is a first-class JSON database

As we’ve said before, Postgres 16 is a first-class database for JSON (read Craig’s post on the evolution of JSON and JSONB). Given JSON was first introduced in 9.4 and now we are on version 16, the continued rollout of JSON improvement means the capabilities are thriving in the Postgres ecosystem. We know lots of folks are using JSON in Postgres and we’re happy to see several hands on and under the hood improvements.

Avatar for Christopher Winslett

Written by

Christopher Winslett

September 20, 2023 More by this author