Indexing JSONB in Postgres

Postgres is amazing for many reasons. One area that doesn't get near enough attention is datatypes. Postgres has a rich set of datatypes and one important one for devs to be especially excited about is JSONB.

JSONB which is structured and indexable. In JSON, the B stands for binary (or as we like to think of it B is for better), which means data is pre-parsed as it is stored. How do you get the most out of JSONB from a retrieval perspective? Enter Postgres' rich indexing support.

Postgres index types

Most databases have a standard single index type: B-Tree. B-Tree is a balanced tree structure, and the common type of index you learn about from a CS degree perspective. When you do a standard CREATE INDEX a B-Tree index is what is created. This works for standard WHERE clauses that target that value.

But Postgres has other index types including:

GIN - Generalized Inverted Index
GiST - Generated Search Tree
Sp-GiST - Space-Partitioned Generalized Search Tree
BRIN - Block Range Index

So the go-to Postgres database, the B-Tree, isn’t suited well for JSON documents, or at least not how you may think, because of the nature of nested structures. So how do you index your JSONB to more efficiently query it? Enter GIN indexes.

GIN indexes for JSONB

Instead of indexing the entire JSONB document as a unit, a GIN index breaks it apart and indexes the keys and values inside. Think of it as creating a giant lookup table under the hood. If you have a row like this:

{
  "status": "active",
  "plan": "pro"
}

A GIN index on this row will store entries like:

status => active

plan => pro

This makes GIN ideal for answering questions like:

SELECT *
FROM my_table
WHERE data @> '{"status": "active"}';

The @> operator (JSONB containment) is GIN indexable, and GIN can quickly find documents where that key value pair exists.

Creating the index is straightforward:

CREATE INDEX idx_data_gin
ON my_table
USING gin (data);

You can also be more specific with a partial or expression index if you only need to index a subset of keys:

CREATE INDEX idx_status_gin
ON my_table
USING gin ((data->'status'));

What queries use the GIN index?

The key here is: not all JSONB queries will benefit from a GIN index. Queries that can use GIN include:

Containment: data @> '{"plan": "pro"}'
Key existence: data ? 'status'
Any key match: data ?| array['plan', 'tier']
All keys match: data ?& array['plan', 'status']

These are operator-based queries that map well to the inverted index structure.

What JSONB queries don’t use GIN?

Here’s where sometimes you can get caught off guard. You added a GIN index, set up the query in your app, and have the worst of both worlds: an index being maintained but slow queries because they can't use the index.

GIN indexes won’t help with:

Path-based navigation: data->'user'->>'email' = 'craig@example.com'
Comparisons within the JSONB: (data->>'age')::int > 30
Regex or pattern matches inside values: data->>'name' ILIKE 'craig%'

Maintenance with GIN and JSONB

While GIN indexes are powerful, they have a larger write overhead than standard B-tree indexes. This overhead becomes especially apparent if you're frequently updating large JSONB columns. Frequent large updates can lead to index bloat, where the index contains too many references to dead rows and becomes inefficient.

Due to the potential for bloat, actively monitoring the health of your GIN indexes is key. You can manage this by periodically running the REINDEX CONCURRENTLY command to rebuild the index and reclaim wasted space. We also recommend using internal tools like the pgstattuple extension to check the index's status and identify bloat before it becomes a significant issue.

Expression indexes for JSONB

For some cases, creating a typical B-tree expression index can help with JSON that doesn’t fit the GIN use cases. Creating an expression index involves defining an index not just a part of the array, but on the result of an operation performed on that column. For instance:

CREATE INDEX idx_orders_total ON orders (((details->>'order_total')::numeric))

This builds an index on the orders table. It works by first accessing the details jsonb column, extracting the value associated with the order_total key as text using the ->> operator, and then casting that text value to a numeric type. Postgres then builds a standard B-tree index on these resulting numeric values. This can be efficient for range scans and sorting inside these JSONB rows.

Keep in mind that a requirement for using an expression index is that the WHERE clause of your query must exactly match the expression used to define the index. For the index created above, a query like WHERE (details->>'order_total')::numeric > 100 would use the index, but a slightly different query, such as WHERE (details->>'order_total')::float > 100, would not.

Strict matching means expression indexes are great for optimizing well-defined, static queries that are embedded in your application's code. But not queries that often change.

Best practices for JSONB indexing

Use GIN for containment-style lookups, especially if you don't know the full schema ahead of time.
Don’t GIN the whole JSONB column if you only ever query specific keys—use expression indexes or partial indexes instead.
Combining GIN with traditional B-tree indexes (ie expression indexes) on structured columns is the key to keeping performance predictable.

JSONB is a powerful tool in the PostgreSQL toolbox, but to unlock its performance potential, you have to understand the indexing story. GIN indexes are your best friend when you need to query inside documents but they're not a silver bullet. Knowing when and how to use them is key.

If you're working with JSONB-heavy workloads in production, this is one of those "measure twice, index once" situations. If you’re on Crunchy Bridge or managing a large Postgres fleet, having observability into which indexes are getting used (and which aren’t) is just as important.

postgres index types

Latest Articles