CrunchyData Blog

Logical replication from Postgres to Iceberg

Marco.Slot@crunchydata.com (Marco Slot) — Tue, 22 Apr 2025 09:00:00 EDT

Operational and analytical workloads have historically been handled by separate database systems, though they are starting to converge. We built Crunchy Data Warehouse to put PostgreSQL at the frontier of analytics systems, using modern technologies like Iceberg and a hybrid query engine.

Combining operational and analytical capabilities is extremely useful, but it is not meant to drive all your workloads into a single system. In most organizations, application developers and analysts work in different teams with different requirements on data modeling, resource management, operational practices, and various other aspects.

What will always be needed is a way to bring data and the stream of changes from an operational database into a separate analytics system. As it turns out, if both sides are PostgreSQL, magical things can happen…

Today, we are announcing the availability of native logical replication from Postgres tables in any Postgres server to Iceberg tables managed by Crunchy Data Warehouse.

The latest release of Crunchy Data Warehouse includes full support for:

Insert, update, delete, and truncate replication into Iceberg
High transaction rates
Low < 60 second apply lag
Preservation of transaction boundaries–foreign key constraints still hold
Automatic table creation and data copy
Automatic compaction
Advanced replication protocol features like row filters, streaming (v4 protocol), and failover slots.
Automatic handling of TOAST columns
Ability to rebuild tables while old data remains readable

While it sounds like something from the future, logical replication to Iceberg is available right now on Crunchy Bridge, and will be available for self-managed users in the next release of Crunchy Postgres for Kubernetes.

Setting up logical replication into Iceberg

Getting started with logical replication to Iceberg is very simple. You can literally set up everything with just 2 commands.

On the source:

create publication pub for table chats, users;

On Crunchy Data Warehouse, after ensuring connectivity to the source:

create subscription sub connection '...' publication pub with (create_tables_using = 'iceberg');

The create subscription command will create Iceberg tables for all tables in the publication, then copy the initial data in the background, and then replicate changes. You can also set up the Iceberg tables manually before creating the subscription.

You can run high performance analytical queries and data transformations directly on the Iceberg tables in Crunchy Data Warehouse once the initial data copy completes, or use other query engines with the SQL/JDBC Iceberg catalog driver.

How Postgres-to-Iceberg replication works

Conventional tools for applying a stream of changes to a data warehouse take large batches and apply them using merge commands. While effective, the computational cost of running these commands is relatively high, and increases significantly as the table grows.

We invented several new techniques to apply insertions and deletions to Iceberg in micro batches by taking advantage of Postgres’ transactional capabilities. Queries use an efficient merge-on-read method to apply deletions. Insertion and deletion files are later merged during automatic compaction, and compaction only accesses files that were (significantly) modified.

What that means is that replication can be sustained with relatively low lag and low overhead. The main cost is that the replication requires some disk space, though usually much less than the source data.

Get started with replication to your Postgres Data Warehouse

Our goal is to bring all PostgreSQL features and extensions to Iceberg with high performance analytics. Logical replication is a useful Postgres feature that becomes essential in the context of a data warehouse, given the need to synchronize data from operational databases.

Of course, PostgreSQL isn’t perfect. Where possible we try to go the extra mile to build a seamless experience, for instance by enabling automatic Iceberg table creation in CREATE SUBSCRIPTION. There are many other ways in which we think the logical replication experience can be improved, especially for Iceberg, so this is the start of a journey.

If you want to get started with this seamless Postgres -> Iceberg replication experience we encourage you to reach out to us or check out the documentation.

Hacking the Postgres Statistics Tables for Faster Queries

Louise.Grandjonc.Leinweber@crunchydata.com (Louise Grandjonc Leinweber) — Wed, 16 Apr 2025 09:00:00 EDT

Postgres does a great job of making queries really efficient. By gathering data in internal statistics tables, Postgres estimates before a query is run lots of things - like will an index scan be better than a sequential scan. How to pull data for the WHERE statement.

What Postgres doesn’t know …. is how your columns are related to each other. Postgres isn’t a machine learning algorithm. It is not going to learn over time as you query things what is related and what isn't. It uses the same statistical probabilities regardless of the content in your columns. But don’t get too discouraged. You can help the planner along. You can actually add table statistics and tell Postgres about your data to help with query performance.

On today’s blog, let’s walk through how table statistics work and how you can add statistics for relatedness. This blog is based on a talk done at Postgres Conference Europe, Deep Dive into Postgres Table Statistics, Only part of the talk will be covered in this blog.

There are some sample EXPLAIN plans and data in here from a database of clinical trials, available as Postgres dmp file at https://aact.ctti-clinicaltrials.org.

What statistics are gathered

Postgres gathers statistics on your tables when you run ANALYZE or when autovacuum runs automatically.

For each column, depending on its type, it will gather the following statistics:

Distinct values: An estimate of the number of unique values in a column.
Average data width: The typical size of values in a column.
Null fraction: The proportion of NULL values in a column.
Correlation: Varies from -1 to 1. Describes the correlation between physical order of your tuples and values order of this column. For example, if you use a serial bigint for your id, the correlation will be closer to 1 than if you use uuid.
Most common values (MCV) and their frequencies.
Histograms: Describe the data distribution outside of the most common values

Postgres provides the pg_stats view, which shows a user-friendly version of the pg_statistics table. The pg_statistics table is optimized for disk space and isn’t easy for users to look at directly.

Let's look at a couple examples of statistics that you can find in the view.

Most common column values (MCV) and their frequency

SELECT * FROM pg_stats WHERE tablename = 'studies' AND attname = 'study_type';
-[ RECORD 1 ]----------+-----------------------------------------------
schemaname             | ctgov
tablename              | studies
attname                | study_type
inherited              | f
null_frac              | 0.0021
avg_width              | 14
n_distinct             | 3
most_common_vals       | {INTERVENTIONAL,OBSERVATIONAL,EXPANDED_ACCESS}
most_common_freqs      | {0.7689667,0.2265,0.0024333333}
histogram_bounds       | (null)
correlation            | 0.64599395
most_common_elems      | (null)
most_common_elem_freqs | (null)
elem_count_histogram   | (null)

Here, the histogram is null because all values are covered in the MCV (it's an enum). The frequency of the value INTERVENTIONAL is 76.9%.

Histograms of column value frequency

SELECT * FROM pg_stats WHERE tablename = 'baseline_counts' AND attname = 'count';
schemaname             | ctgov
tablename              | baseline_counts
attname                | count
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 1729
most_common_vals       | {6,3,10,..,94,104}
most_common_freqs      | {0.0328,0.0254,0.023,..,0.0019,0.0019}
histogram_bounds       | {83,84,88,89,95,97,99,107,108,111,113,115,117,118,121,123,125,126,129,132,134,136,139,142,145,147,149,152,155,158,161,163,166,169,173,176,179,182,186,191,195,199,201,204,207,212,216,220,224,229,235,240,245,250,255,261,267,274,280,287,296,301,308,315,324,333,344,352,362,375,388,400,410,424,442,456,478,495,511,530,554,582,607,642,680,722,774,821,884,965,1057,1172,1305,1518,1751,2159,3031,4357,6817,15871,2622164}
correlation            | -0.0021751618
most_common_elems      | (null)
most_common_elem_freqs | (null)
elem_count_histogram   | (null)

A histogram is made of buckets. Each bucket should contain, roughly, the same percentage of rows. In this specific example, there should be around the same number of rows in baseline_counts, where the count is between 83 and 84, as between 245 and 250.

How Postgres uses statistics in query planning

Postgres uses statistics to estimate query costs and select the most efficient execution plan. It considers:

How many rows will be returned
The total data size
The number of disk pages that need to be scanned

Selectivity is the fraction of rows a query will return. The query planner relies on selectivity estimates to determine, for example, whether to use an index scan or a sequential scan. If a query filters out most rows, an index scan is preferred. If a query returns most rows, a sequential scan is better.

The selectivity of a single clause

For WHERE column = value. If the value is in the MCV list, Postgres uses the stored frequency. If not, it assumes an even distribution of non-MCV values and estimates selectivity accordingly.

For scalar queries (WHERE column < value or WHERE column > value). The optimizer uses both MCVs and histograms to estimate selectivity.

Postgres will gather the following elements:

The sum of all MCV selectivities (sumcommon)
The fraction of null values (nullfrac)
The MCV selectivity (mcv_select): This is the sum of the frequency of MCVs matching the clause
The histogram selectivity (hist_select): This is the percentage of buckets matching the clause. To get that postgres loops through the histogram and count the number of buckets matching the clause. The histogram selectivity will be match/number of buckets

Postgres will then use all of this to calculate the selectivity of our clause.

Initialize selectivity:

select = 1.0 - nullfrac - sumcommon

Merge the histogram selectivity:

select *= hist_selec

Merge the MCV selectivity:

select += mcv_select

Combining clauses

Most queries have more than one WHERE clause. The planner estimates selectivity for each column separately and multiplies them.

Let's look at this query:

SELECT * FROM studies WHERE phase = 'PHASE1' AND brief_titleILIKE '%diabetes%';

Merging by multiplying means that out of the 8.5% of studies in phase 1, 2% have diabetes in their title.

By default, Postgres assumes columns are independent when estimating query results. This can lead to inaccurate estimates.

For example, in this query:

SELECT nct_id, name FROM facilities WHERE city = 'Lyon' AND country = 'France';

However, 100% of cities named Lyon are in France (actually, there is Lyon, Texas, but not relevant to this specific database), so this approach underestimates row counts leading to a bad query plan.

Extended statistics

You can manually force Postgres to link two columns. Postgres supports extended statistics, including:

Functional dependencies
Multivariate distinct counts
Multivariate most common values

Creating statistics works in two parts:

Adding the statistic on the the columns and table through CREATE STATISTICS
Running a table ANALYZE

Dependencies

Functional dependency describes a dependency between two columns.

It can be because there is a relationship between them (city and country for example) of because the values of two columns vary together (column a = column b + 1).

Extended Statistics Examples - Dependencies

EXPLAIN ANALYZE SELECT nct_id, name FROM facilities WHERE city = 'Lyon' AND country = 'France';
                                                                  QUERY PLAN
---------------
 Index Scan using index_facilities_on_city on facilities  (cost=0.43..375.61 rows=19 width=47) (actual time=1.708..1839.398 rows=5816 loops=1)
   Index Cond: ((city)::text = 'Lyon'::text)
   Filter: ((country)::text = 'France'::text)
 Planning Time: 0.097 ms
 Execution Time: 1840.283 ms
(5 rows)

In the explain plan, you might notice that the estimated rows was 19, and the actual number was 5816. This caused Postgres to pick a less efficient plan.

Here is how it calculated the selectivity:

rows=19
selectivity France: 0.060533334 (6%), France is in the MCV, so we have the frequency
selectivity Lyon: 0.000103 (0.01%), Lyon is not in the MCV, so it calculated the selectivity as being the same for any city not in it.
reltuples: 3132540

3132540 _ 0.060533334 _ 00.000103 = 19

Postgres makes two mistakes:

Assuming that the cities, outside of the MCV are evenly distributed
Assuming that only 6% of cities named Lyon are in France

Let's manually add extended statistics:

CREATE STATISTICS (dependencies) ON country, city FROM facilities;
ANALYZE facilities;

Re-run the EXPLAIN plan

EXPLAIN ANALYZE SELECT nct_id, name FROM facilities WHERE city = 'Lyon' AND country =
'France';
                                                                       QUERY PLAN
------------------------
 Bitmap Heap Scan on facilities  (cost=1586.13..1985.73 rows=5845 width=47) (actual time=13.080..18.711 rows=5816 loops=1)
   Recheck Cond: (((city)::text = 'Lyon'::text) AND ((country)::text = 'France'::text))
   Heap Blocks: exact=4772
   ->  BitmapAnd  (cost=1586.13..1586.13 rows=362 width=0) (actual time=12.436..12.437 rows=0 loops=1)
         ->  Bitmap Index Scan on index_facilities_on_city  (cost=0.00..56.23 rows=6414 width=0) (actual time=0.431..0.431 rows=5816 loops=1)
               Index Cond: ((city)::text = 'Lyon'::text)
         ->  Bitmap Index Scan on index_facilities_on_country  (cost=0.00..1526.73 rows=180773 width=0) (actual time=11.802..11.802 rows=185561 loops=1)
               Index Cond: ((country)::text = 'France'::text)
 Planning Time: 0.316 ms
 Execution Time: 18.974 ms
(10 rows)

So from 1840.283 to 18.971 that’s 97x faster 🔥!

Now the estimated rows and the actual rows are close, Postgres was able to pick the proper scan to handle this query more efficiently as can be seen in the execution time.

Ndistinct

By default, Postgres assumes that the distinct count of grouped columns is independent. This can lead to incorrect cardinality estimates when the columns are actually correlated.

Correlated columns are common in parent child data sets, like for example here where category and title are corrected and each category only has a limited set of title values.

When you create statistics using ndistinct, Postgres collects and stores information about how many distinct values exist in a combination of columns. This helps improve the planner's estimates for queries that involve GROUP BY, DISTINCT, or filtering conditions across multiple columns.

To improve that you can do:

CREATE STATISTICS (ndistinct) on category, title FROM baseline_measurements;
ANALYZE baseline_measurements;

Before the statistics

EXPLAIN ANALYZE SELECT category, title, SUM(number_analyzed)
FROM baseline_measurements
WHERE category IS NOT NULL
GROUP BY category, title
ORDER BY 3 DESC
LIMIT 10;
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=210450.73..210450.76 rows=10 width=44) (actual time=907.290..907.311 rows=10 loops=1)
   ->  Sort  (cost=210450.73..211027.87 rows=230854 width=44) (actual time=907.288..907.310 rows=10 loops=1)
         Sort Key: (sum(number_analyzed)) DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Finalize GroupAggregate  (cost=138119.41..205462.06 rows=230854 width=44) (actual time=609.776..903.558 rows=38379 loops=1)
               Group Key: category, title
               ->  Gather Merge  (cost=138119.41..199690.71 rows=461708 width=44) (actual time=609.770..895.979 rows=42241 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Partial GroupAggregate  (cost=137119.39..145398.13 rows=230854 width=44) (actual time=589.234..763.160 rows=14080 loops=3)
                           Group Key: category, title
                           ->  Sort  (cost=137119.39..138611.94 rows=597020 width=40) (actual time=589.220..724.301 rows=482252 loops=3)
                                 Sort Key: category, title
                                 Sort Method: external merge  Disk: 24272kB
                                 Worker 0:  Sort Method: external merge  Disk: 22456kB
                                 Worker 1:  Sort Method: external merge  Disk: 24128kB
                                 ->  Parallel Seq Scan on baseline_measurements  (cost=0.00..63515.52 rows=597020 width=40) (actual time=0.051..98.694 rows=482252 loops=3)
                                       Filter: (category IS NOT NULL)
                                       Rows Removed by Filter: 287745
 Planning Time: 0.202 ms
 Execution Time: 908.366 ms
(21 rows)

After

EXPLAIN ANALYZE SELECT category, title, SUM(number_analyzed)
FROM baseline_measurements
WHERE category IS NOT NULL
GROUP BY category, title
ORDER BY 3 DESC
LIMIT 10;
                                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=71080.18..71080.21 rows=10 width=43) (actual time=341.044..341.069 rows=10 loops=1)
   ->  Sort  (cost=71080.18..71094.80 rows=5849 width=43) (actual time=341.043..341.067 rows=10 loops=1)
         Sort Key: (sum(number_analyzed)) DESC
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Finalize GroupAggregate  (cost=69442.70..70953.79 rows=5849 width=43) (actual time=262.366..337.346 rows=38379 loops=1)
               Group Key: category, title
               ->  Gather Merge  (cost=69442.70..70807.56 rows=11698 width=43) (actual time=262.239..330.513 rows=42726 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Sort  (cost=68442.68..68457.30 rows=5849 width=43) (actual time=258.648..259.082 rows=14242 loops=3)
                           Sort Key: category, title
                           Sort Method: quicksort  Memory: 1475kB
                           Worker 0:  Sort Method: quicksort  Memory: 1466kB
                           Worker 1:  Sort Method: quicksort  Memory: 1467kB
                           ->  Partial HashAggregate  (cost=68018.21..68076.71 rows=5849 width=43) (actual time=170.306..171.412 rows=14242 loops=3)
                                 Group Key: category, title
                                 Batches: 1  Memory Usage: 2577kB
                                 Worker 0:  Batches: 1  Memory Usage: 2577kB
                                 Worker 1:  Batches: 1  Memory Usage: 2577kB
                                 ->  Parallel Seq Scan on baseline_measurements  (cost=0.00..63511.45 rows=600902 width=39) (actual time=0.042..88.588 rows=482252 loops=3)
                                       Filter: (category IS NOT NULL)
                                       Rows Removed by Filter: 287745
 Planning Time: 0.205 ms
 Execution Time: 341.418 ms
(24 rows)

Here creating statistics, Postgres was able to pick a HashAggregate in memory, which made the query 3 times faster.

Multivariate MCV

MCV (Most Common Values) statistics help Postgres optimize query planning by tracking the most frequently occurring values in one or more columns. These statistics improve the accuracy of selectivity estimates, particularly for queries with filters like WHERE column = value.

By default, Postgres automatically collects MCV statistics for individual columns but you can manually add statistics for correlated columns. MCV captures the most frequently occurring pairs not just individual values). This helps the planner make better row count estimates when filtering on these columns.

EXPLAIN ANALYZE
SELECT nct_id, organ_system, adverse_event_term,
frequency_threshold
FROM reported_events
WHERE organ_system = 'Respiratory, thoracic and mediastinal disorders'
AND adverse_event_term = 'Hypoxia'
ORDER BY frequency_threshold DESC
LIMIT 10;
                                                                            QUERY PLAN
-----------------------------------------------------
 Limit  (cost=69.99..70.01 rows=10 width=63) (actual time=16.643..16.645 rows=10 loops=1)
   ->  Sort  (cost=69.99..70.14 rows=60 width=63) (actual time=16.642..16.643 rows=10 loops=1)
         Sort Key: frequency_threshold DESC
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Index Scan using reported_events_organ_system_adverse_event_term_idx on reported_events  (cost=0.56..68.69 rows=60 width=63) (actual time=0.026..12.896 rows=18361 loops=1)
               Index Cond: (((organ_system)::text = 'Respiratory, thoracic and mediastinal disorders'::text) AND ((adverse_event_term)::text = 'Hypoxia'::text))
 Planning Time: 0.111 ms
 Execution Time: 16.666 ms
(8 rows)

Create statistics

CREATE STATISTICS (mcv) on organ_system, adverse_event_term FROM reported_events;
ANALYZE reported_events;

After

EXPLAIN ANALYZE
SELECT nct_id, organ_system, adverse_event_term,
frequency_threshold
FROM reported_events
WHERE organ_system = 'Respiratory, thoracic and mediastinal disorders'
AND adverse_event_term = 'Hypoxia'
ORDER BY frequency_threshold DESC
LIMIT 10;
                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19198.72..19198.75 rows=10 width=63) (actual time=16.500..16.502 rows=10 loops=1)
   ->  Sort  (cost=19198.72..19241.43 rows=17082 width=63) (actual time=16.499..16.500 rows=10 loops=1)
         Sort Key: frequency_threshold DESC
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Index Scan using reported_events_organ_system_adverse_event_term_idx on reported_events  (cost=0.56..18829.59 rows=17082 width=63) (actual time=0.026..12.844 rows=18361 loops=1)
               Index Cond: (((organ_system)::text = 'Respiratory, thoracic and mediastinal disorders'::text) AND ((adverse_event_term)::text = 'Hypoxia'::text))
 Planning Time: 0.152 ms
 Execution Time: 16.524 ms
(8 rows)

In this example, having statistics didn’t change the query plan, or improve the query time. It did make the expected number of rows in the index scan go from 60 to 17082, so closer to the actual 18361 rows.

It’s a good example of a case where inaccurate statistics didn’t hurt your performance. Choosing which statistics are important might just be part of your adventure. You might need to test different approaches, and look at your query plan.

Choosing between extended statistics

The type of extended statistics that you will create depends on the operation that you use:

If you only use =, use dependency
If you have GROUP BYs, you will need an ndistinct
If you use scalar operators, you need MCV list.

Limitations of extended statistics

Because histograms aren’t supported in extended statistics, they are only

accurate:

For MCVs
If the rest of your dataset is evenly distributed

If Postgres cannot use an extended statistics, it will default back to the default statistics.

You can improve this by increasing the statistic target, either for a column, or for your entire database.

default_statistics_target: default is 100, can go from 1 to 10000.
ALTER TABLE facilities ALTER COLUMN city SET STATISTICS 200;

Altering statistics targets can improve things, but for very rare values, you'll again see differences in the estimates and actual number of rows. It is not recommended to just go to the maximum stats target, that would make ANALYZE and vacuum slow and expensive. But this is something to be aware of. It might just be okay, often, we query less rare values compared to the common ones. So you can decide that it's fine to have slightly slower uncommon queries.

To go further with table statistics

A lot more is covered in the slides for "A Deep Dive into Statistics" presented at last year’s PGConfEU 2024. There is quite a bit more covered in there about the specific algorithms used by Postgres to compute selectivity, statistics, etc.

If you are interested in learning more about this through reading the source code, here is where you can start:

Algorithms to compute the selectivity of each clause: src/backend/utils/adt/selfuncs.c
How the optimizer combines selectivities for AND/OR/JOIN: src/backend/optimizer/path/clausesel.c
Calculating the cost: src/backend/optimizer/path/costsize.c

Conclusion

Postgres keeps detailed table statistics to track most common values and other details about the column data. These statistics are used by the query planner and table statistics themselves can directly impact query performance.

Postgres users can add extended statistics and explicitly tell Postgres about functional dependencies and correlations between columns. In some cases, these can dramatically improve query planning and performance.

With just a few lines of SQL, you can help the planner make smarter choices, reduce execution time, and get the most out of your database. As always, testing and evaluating with EXPLAIN is the best way to experiment and confirm additional table statistics.

OpenTelemetry Observability in Crunchy Postgres for Kubernetes

Andrew.L'Ecuyer@crunchydata.com (Andrew L'Ecuyer) — Wed, 09 Apr 2025 09:00:00 EDT

In today's landscape of complex systems with numerous observability options, OpenTelemetry has emerged as the standard for collecting logging and metrics. It creates a vendor-agnostic platform that works with almost any source and destination by taking in logs and metrics from all components, standardizing them, and routing them where needed. Though setup requires effort, the payoff is substantial: a unified view of your entire system, even across distributed environments.

With Crunchy Postgres for Kubernetes 5.8, we've automated OpenTelemetry for your Postgres databases and related services. By activating a few settings, your metrics and logs can flow to various OpenTelemetry-compatible systems, letting you focus on maintaining healthy Postgres deployments without the burden of building complex monitoring infrastructure.

Let's explore why we chose OpenTelemetry as our foundation, how to set it up, and ways to extend it to deliver your critical data to external systems.

Observability for Postgres

At its heart, observability means understanding a system's state by analyzing its outputs. This includes examining logs and watching performance metrics to gain broader insights into system behavior and troubleshoot issues. Collecting this data centrally provides comprehensive visibility without requiring deep knowledge of internal implementations.

Example of some of the logs and metrics available from Postgres

Managing Postgres at scale requires viewing all system outputs consistently. The challenge is that each system has its own logging solution with different formats for logs and metrics, and the sheer volume can be overwhelming.

This is where OpenTelemetry comes into play. By providing a consistent standard and framework for managing these external outputs, OpenTelemetry allows you to bring consistency to the various outputs produced by the systems within your Postgres deployments. This consistency not only streamlines and simplifies your ability to view logging and metrics information, but it also allows you to manage those logs and metrics using a variety of OpenTelemetry-compatible services and backends.

This means you can focus on viewing and understanding your system’s outputs, without having to worry about vendor lock-in or reimplementing your solution for collecting logs and metrics when switching and/or trying out new observability solutions and strategies.

Crunchy Postgres for Kubernetes now leverages OpenTelemetry out-of-the-box to unify logs and metrics from all your Postgres databases. This covers not just the database itself, but also all of the supporting systems and services. Since Crunchy Postgres for Kubernetes handles the heavy lifting of collecting, filtering, and transforming data according to OpenTelemetry standards, you can concentrate on analyzing those outputs through your preferred observability platforms and dashboards.

OpenTelemetry in Action using Crunchy Postgres for Kubernetes

Using OpenTelemetry within Crunchy Postgres for Kubernetes is easy! This section will walk you through the steps required to enable OpenTelemetry for both metrics and logging. As you will see, with just a few simple steps you can enhance the observability of your Postgres deployments, and obtain deeper insights into the health of the databases and components comprising those deployments.

Enabling the OpenTelemetry for Logging & Metrics

In order to use OpenTelemetry for logging and metrics, the OpenTelemetryLogging and OpenTelemetryMetrics feature gates must be enabled. Please see the Feature Gate Installation Guide for guidance on how to enable these feature gates using your method of installing Crunchy Postgres for Kubernetes.

Once the feature gates are enabled and your PostgresCluster spec has been updated according to the OpenTelemetry logging and metrics guides, you will see OpenTelemetry collector sidecars deployed alongside the various Pods deployed for any new or existing PostgresCluster’s, as shown in the following diagram:

These sidecars will be responsible for collecting and transforming any logs and metrics, and then exporting them to one or more OpenTelemetry-compatible services or backends.

Configuring OpenTelemetry for Google Cloud Logging

Let’s look at how you can use Crunchy Postgres for Kubernetes and OpenTelemetry to export logs to Google Cloud Logging. This will provide you with a robust solution for viewing logging information across the various components comprising your Postgres cluster.

Since we will be using Google Cloud Logging, I will be using GKE for my Kubernetes. This will allow me to leverage some of the built-in methods Google provides to streamline and simplify exporting OpenTelemetry logs to Google Cloud Logging.

First, make sure that the OpenTelemetryLogging and OpenTelemetryMetrics feature gates are enabled. Then create a PostgresCluster with two Postgres replicas, as well as pgBackRest and PgBouncer deployments:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: hippo-otel
spec:
  imagePullSecrets:
    - name: crunchy-regcred
  postgresVersion: 17
  instances:
    - name: instance1
      dataVolumeClaimSpec:
        accessModes:
          - 'ReadWriteOnce'
        resources:
          requests:
            storage: 1Gi
  backups:
    pgbackrest:
      repos:
        - name: repo1
          volume:
            volumeClaimSpec:
              accessModes:
                - 'ReadWriteOnce'
              resources:
                requests:
                  storage: 1Gi
  proxy:
    pgBouncer: {}

This will allow us to capture OpenTelemetry logs from the Postgres database, as well as from the Disaster Recovery and connection pooling components.

Next, update the PostgresCluster with the following instrumentation section:

instrumentation:
  config:
    detectors:
      - name: gcp
    exporters:
      googlecloud:
        log:
          default_log_name: 'collector-exported-log'
          resource_filters:
            - prefix: 'k8s'
            - prefix: 'db'
  logs:
    exporters: ['googlecloud']

This will add a simple OpenTelemetry exporter for Google Cloud logging. Note that the googlecloud exporter configuration comes directly from the Google Cloud Exporter configuration reference, which means you can further tune your exporter as needed according to that doc. And for additional details about the other settings available under the instrumentation sections, see the OpenTelemetry logging guide in the Crunchy Postgres for Kubernetes documentation.

Now lets navigate to Google Cloud Logging to view logs for hippo-otel. We will specifically navigate to the “Logs Explorer” view, and then run the following query:

logName=~".*/logs/collector-exported-log"

Running this query will display any logs coming from the PostgresCluster, according to the default_log_name configured for the exporter in the previous step.

And that's it! With just these few steps, you can now start viewing and analyzing logs for your deployment using Google Cloud Logging.

Let's say you want to look at any connection pooling logs coming from the hippo-otel cluster. Doing this is as simple as adding the following to your query:

resource.labels.container_name="pgbouncer"

And then running the query again.

So as you can clearly see, Crunchy Postgres for Kubernetes puts the various logs for your system directly at your fingertips, allowing you to quickly, easily and seamlessly access the logging information you require to better understand your running deployments.

Setting up OpenTelemetry Metrics

In Crunchy Postgres for Kubernetes v5.8, the switch to OpenTelemertry for metrics is a drop-in replacement for the previous monitoring architecture that used the Prometheus Postgres Exporter. Therefore, you can still leverage the power of the pgMonitor project to obtain crucial insights into your Postgres deployments, and then visualize those metrics information using Grafana dashboards. However, instead of the Postgres Exporter running as a sidecar alongside the components making up your PostgresCluster, an OpenTelemetry collector will now be utilized instead. This means your metrics data will now be exported into Prometheus using the OpenTelemetry standard.

Let’s go ahead and enable OpenTelemetry logging for the hippo-otel PostgresCluster we created in the previous demonstration of OTel logging.

If you followed the steps above, the OpenTelemetryMetrics feature gate should already be enabled, so, configuring OpenTelemetry metrics simply requires the following in your spec:

spec:
  instrumentation: {}

From there, install Prometheus and Grafana according to installation instructions to start viewing any OpenTelemetry metrics. For this example, I will install these components using the crunchy-monitoring Helm chart.

helm install crunchy-monitoring -n postgres-operator \
oci://registry.developers.crunchydata.com/crunchydata/crunchy-monitoring \
--set-json grafana='{"admin":{"username":"admin","password":"admin"}}'

Next, port-forward to the Grafana Service, and login using the credentials provided in the previous step.

kubectl -n postgres-operator port-forward service/crunchy-monitoring-grafana
3000:3000

And that's it! You can now view metrics for your database using the data that has been collected using OpenTelemetry.

For instance, just like we viewed the PgBouncer logs, let’s now visualize PgBouncer metrics. PgBouncer metrics are a new addition to the Crunchy Postgres for Kubernetes Metrics and Monitoring stack for the Crunchy Postgres for Kubernetes v5.8 release.

For a better understanding of the metrics and dashboards available, please see the Monitoring Architecture page in the Crunchy Postgres for Kubernetes documentation. And while this initial release of OpenTelemetry metrics focuses on exporting data to Prometheus for use with the pgMonitor dashboards, stay tuned for future releases where it will be possible to export your metrics data into a variety of different OpenTelemetry-compatible services and backends by defining your own exporters!

Conclusion

I hope you enjoyed seeing just how easy it is to enhance the observability of your Postgres databases using Crunchy Postgres for Kubernetes and OpenTelemetry, and how these powerful new features better equip you to manage, maintain, and understand your Postgres deployments in Kubernetes. To continue the conversation around anything discussed in this blog, please feel free to reach out to your account team, or via the Crunchy Discord Server. I look forward to hearing about your observability goals, and how Crunchy Postgres for Kubernetes 5.8 will help to simplify, streamline, and enhance the observability of your Postgres deployments!

Creating Histograms with Postgres

Christopher.Winslett@crunchydata.com (Christopher Winslett) — Fri, 04 Apr 2025 10:00:00 EDT

Histograms were first used in a lecture in 1892 by Karl Pearson — the godfather of mathematical statistics. With how many data presentation tools we have today, it’s hard to think that representing data as a graphic was classified as “innovation”, but it was. They are a graphic presentation of the distribution and frequency of data. If you haven’t seen one recently, or don’t know the word histogram off the top of your head - it is a bar chart, each bar represents the count of data with a defined range of values. When Pearson built the first histogram, he calculated it by hand. Today we can use SQL (or even Excel) to extract this data continuously across large data sets.

While true statistical histograms have a bit more complexity for choosing bin ranges, for many business intelligence purposes, Postgres width_bucket is good-enough to counting data inside bins with minimal effort.

Postgres width_bucket for histograms

Given the number of buckets and max/min value, width_bucket returns the index for the bucket that a value will fall. For instance, given a minimum value of 0, a maximum value of 100, and 10 buckets, a value of 43 would fall in bucket #5: select width_bucket(43, 0, 100, 10) AS bucket; But 5 is not correct for 43, or is it?

You can see how the values would fall using generate_series (shown below using Metabase):

SELECT value, width_bucket(value, 0, 100, 10) AS bucket FROM generate_series(0, 100) AS value;

When running the query, the values 0 through 9 go into bucket 1. As you can see in the image above, width_bucket behaves as a step function that starts indexing with 1. In this scenario, when passed a value of 100, width_bucket returns 11, because the maximum value given the width_bucket is an exclusive range (i.e. the logic is minimum <= value < maximum).

We can use the bucket value to generate more readable labels.

Auto-formatting histogram with SQL

Let’s build out a larger query that creates ranges, range labels, and formats the histogram. We will start by using a synthetic table within a CTE called formatted_data. We are doing it this way so that we can replace that query with new data in the future.

Here’s the beginning of the query (this is copy-pastable into Postgres):

WITH formatted_data AS (
  SELECT * FROM (VALUES (13), (42), (18), (62), (93), (47), (51), (41), (1)) AS t (value)
)
SELECT
  WIDTH_BUCKET(value, 0, 100, 10) AS bucket,
  COUNT(value)
FROM formatted_data
  GROUP BY 1
  ORDER BY 1;

Let’s use another CTE to define some settings for our width_bucket:

WITH formatted_data AS (
  SELECT * FROM (VALUES (13), (42), (18), (62), (93), (47), (51), (41), (1)) AS t (value)
), bucket_settings AS (
	SELECT
		10 as bucket_count,
		0::integer AS min_value, -- can be null::integer or an integer
		100::integer AS max_value -- can be null::integer or an integer
)

SELECT
  WIDTH_BUCKET(value,
	  (SELECT min_value FROM bucket_settings),
		(SELECT max_value FROM bucket_settings),
		(SELECT bucket_count FROM bucket_settings)
	) AS bucket,
  COUNT(value)
FROM formatted_data
  GROUP BY 1
  ORDER BY 1;

In the bucket_settings CTE, we use ::integer to cast any value there as an integer. We do this since we will want to compare NULL against other integers later. If we don’t cast NULLs then the SQL will fail.

Now, we will use a CTE called calculated_bucket_settings to set a dynamic range if the static range is not defined. This will let the data specify the values if they are not defined by the bucket_settings:

WITH formatted_data AS (
  SELECT * FROM (VALUES (13), (42), (18), (62), (93), (47), (51), (41), (1)) AS t (value)
), bucket_settings AS (
	SELECT
		5 AS bucket_count,
		null::integer AS min_value, -- can be null or an integer
		null::integer AS max_value -- can be null or an integer
), calculated_bucket_settings AS (
	SELECT
		(SELECT bucket_count FROM bucket_settings) AS bucket_count,
		COALESCE(
			(SELECT min_value FROM bucket_settings),
			(SELECT min(value) FROM formatted_data)
		) AS min_value,
		COALESCE(
			(SELECT max_value FROM bucket_settings),
			(SELECT max(value) + 1 FROM formatted_data)
		) AS max_value
), histogram AS (
  SELECT
     WIDTH_BUCKET(value, min_value, max_value, (SELECT bucket_count FROM bucket_settings)) AS bucket,
     COUNT(value) AS frequency
   FROM formatted_data, calculated_bucket_settings
   GROUP BY 1
   ORDER BY 1
)

SELECT
   bucket,
   frequency,
   CONCAT(
     (min_value + (bucket - 1) * (max_value - min_value) / bucket_count)::INT,
     ' - ',
     (((min_value + bucket * (max_value - min_value) / bucket_count)) - 1)::INT) AS range
FROM histogram, calculated_bucket_settings;

In the histogram CTE, we use max_value + 1 because the range of values is treated as an exclusive range. Also, because we are working with integers, when you create the pretty label for the range, we subtracted 1 from the maximum value for the range to reduce confusion from what would appear to be overlapping ranges. This decision fits into the “good-enough for business intelligence” caveats listed above. We could have changed the label logic to be 75 <= value < 94 in lieu of the subtraction, but most folks like it see the dash instead of math logic for a histogram.

The query above will give results like the following:

bucket   | frequency |  range
---------+-----------+---------
       1 |         3 | 1 - 18
       3 |         4 | 38 - 55
       4 |         1 | 56 - 74
       5 |         1 | 75 - 93
(4 rows)

Now we see that all buckets and frequencies are not represented. So, if a value is empty, we need to fill in the frequency with a zero. This is where SQL requires thinking in sets. We can use generate_series to generate all values for the buckets, then join the histogram to all values. Flipping the order of the query around makes it simpler than joining an incomplete set. In the following query, we’ve built out the buckets in the all_buckets CTE, then joined that to the histogram values:

WITH formatted_data AS (
  SELECT * FROM (VALUES (13), (42), (18), (62), (93), (47), (51), (41), (1)) AS t (value)
), bucket_settings AS (
  SELECT
        5 AS bucket_count,
        0::integer AS min_value, -- can be null or an integer
        100::integer AS max_value -- can be null or an integer
), calculated_bucket_settings AS (
	SELECT
	  (SELECT bucket_count FROM bucket_settings) AS bucket_count,
	  COALESCE(
	          (SELECT min_value FROM bucket_settings),
	          (SELECT min(value) FROM formatted_data)
	  ) AS min_value,
	  COALESCE(
	          (SELECT max_value FROM bucket_settings),
	          (SELECT max(value) + 1 FROM formatted_data)
	  ) AS max_value
), histogram AS (
  SELECT
    WIDTH_BUCKET(value, calculated_bucket_settings.min_value, calculated_bucket_settings.max_value + 1, (SELECT bucket_count FROM bucket_settings)) AS bucket,
    COUNT(value) AS frequency
  FROM formatted_data, calculated_bucket_settings
  GROUP BY 1
  ORDER BY 1
 ), all_buckets AS (
  SELECT
    fill_buckets.bucket AS bucket,
    FLOOR(calculated_bucket_settings.min_value + (fill_buckets.bucket - 1) * (calculated_bucket_settings.max_value - calculated_bucket_settings.min_value) / (SELECT bucket_count FROM bucket_settings)) AS min_value,
    FLOOR(calculated_bucket_settings.min_value + fill_buckets.bucket * (calculated_bucket_settings.max_value - calculated_bucket_settings.min_value) / (SELECT bucket_count FROM bucket_settings)) AS max_value
  FROM calculated_bucket_settings,
	  generate_series(1, calculated_bucket_settings.bucket_count) AS fill_buckets (bucket))

 SELECT
   all_buckets.bucket AS bucket,
   CASE
   WHEN all_buckets IS NULL THEN
	   'out of bounds'
	 ELSE
     CONCAT(all_buckets.min_value, ' - ', all_buckets.max_value - 1)
   END AS range,
   SUM(COALESCE(histogram.frequency, 0)) AS frequency
 FROM all_buckets
 FULL OUTER JOIN histogram ON all_buckets.bucket = histogram.bucket
 GROUP BY 1, 2
 ORDER BY bucket;

Try modifying the values in the bucket_settings CTE to see how the histogram responds. By increasing the bucket_count, min_value, or max_value, you’ll see the histogram respond appropriately. If you modify the range to exclude values, using the FULL OUTER JOIN, you’ll see that all non-classified items are bucketed as “out of bounds”.

Using a presentation tool, display the histogram as a bar chart (shown below using Metabase):

Real Life Data with Histograms

Now that we have a really nice auto-adjusting query, we can simply build a histogram from other examples. I have a little experimental database from the database of clinical trials.

What if we wanted to build a histogram for the count of participants in various clinical trial studies? To start, build the query that finds the number of participants for each study:

SELECT
	outcomes.nct_id,
	max(outcome_counts.count) AS value
FROM outcomes
INNER JOIN outcome_counts ON outcomes.id = outcome_counts.outcome_id
WHERE param_type = 'COUNT_OF_PARTICIPANTS'
GROUP BY 1

We can take the above query, and place it in the formatted_data CTE:

WITH formatted_data AS (
	SELECT
		outcomes.nct_id,
		MAX(outcome_counts.count) AS value
	FROM outcomes
	INNER JOIN outcome_counts ON outcomes.id = outcome_counts.outcome_id
	WHERE param_type = 'COUNT_OF_PARTICIPANTS'
	GROUP BY 1
), bucket_settings AS (
  SELECT
        20 AS bucket_count,
        null::integer AS min_value, -- can be null or an integer
        null::integer AS max_value -- can be null or an integer
), calculated_bucket_settings AS (
	SELECT
	  (SELECT bucket_count FROM bucket_settings) AS bucket_count,
	  COALESCE(
	          (SELECT min_value FROM bucket_settings),
	          (SELECT min(value) FROM formatted_data)
	  ) AS min_value,
	  COALESCE(
	          (SELECT max_value FROM bucket_settings),
	          (SELECT max(value) + 1 FROM formatted_data)
	  ) AS max_value
), histogram AS (
  SELECT
    WIDTH_BUCKET(value, calculated_bucket_settings.min_value, calculated_bucket_settings.max_value + 1, (SELECT bucket_count FROM bucket_settings)) AS bucket,
     COUNT(value) AS frequency
   FROM formatted_data, calculated_bucket_settings
   GROUP BY 1
   ORDER BY 1
 ), all_buckets AS (
   SELECT
     fill_buckets.bucket AS bucket,
     FLOOR(calculated_bucket_settings.min_value + (fill_buckets.bucket - 1) * (calculated_bucket_settings.max_value - calculated_bucket_settings.min_value) / (SELECT bucket_count FROM bucket_settings)) AS min_value,
     FLOOR(calculated_bucket_settings.min_value + fill_buckets.bucket * (calculated_bucket_settings.max_value - calculated_bucket_settings.min_value) / (SELECT bucket_count FROM bucket_settings)) AS max_value
   FROM calculated_bucket_settings,
	   generate_series(1, calculated_bucket_settings.bucket_count) AS fill_buckets (bucket))

 SELECT
   all_buckets.bucket AS bucket,
   CASE
   WHEN all_buckets IS NULL THEN
	   'out of bounds'
	 ELSE
     CONCAT(all_buckets.min_value, ' - ', all_buckets.max_value - 1)
   END AS range,
   SUM(COALESCE(histogram.frequency, 0)) AS frequency
 FROM all_buckets
 FULL OUTER JOIN histogram ON all_buckets.bucket = histogram.bucket
 GROUP BY 1, 2
 ORDER BY bucket;

The query will output the following. This is a bit un-desirable because the distribution is concentrated in the first bucket:

 bucket |       range       | frequency
--------+-------------------+-----------
      1 | 1 - 359943        |     23261
      2 | 359944 - 719886   |         3
      3 | 719887 - 1079829  |         1
      4 | 1079830 - 1439773 |         0
      5 | 1439774 - 1799716 |         1
      6 | 1799717 - 2159659 |         0
      7 | 2159660 - 2519602 |         0
      8 | 2519603 - 2879546 |         0
      9 | 2879547 - 3239489 |         0
     10 | 3239490 - 3599432 |         0
     11 | 3599433 - 3959375 |         0
     12 | 3959376 - 4319319 |         0
     13 | 4319320 - 4679262 |         0
     14 | 4679263 - 5039205 |         0
     15 | 5039206 - 5399148 |         0
     16 | 5399149 - 5759092 |         0
     17 | 5759093 - 6119035 |         0
     18 | 6119036 - 6478978 |         0
     19 | 6478979 - 6838921 |         0
     20 | 6838922 - 7198865 |         1
(20 rows)

If you’ve loaded the data, to improve the presentation, we can adjust the bucket_settings CTE to modify how the buckets are defined. For instance, with this dataset, if we changed the bucket settings to:

  SELECT
        20 AS bucket_count,
        0::integer AS min_value, -- can be null or an integer
        100::integer AS max_value -- can be null or an integer

It outputs a much nicer distribution of data:

 bucket |     range     | frequency
--------+---------------+-----------
      1 | 0 - 49        |     13584
      2 | 50 - 99       |      3612
      3 | 100 - 149     |      1720
      4 | 150 - 199     |       942
      5 | 200 - 249     |       645
      6 | 250 - 299     |       477
      7 | 300 - 349     |       338
      8 | 350 - 399     |       237
      9 | 400 - 449     |       176
     10 | 450 - 499     |       137
     11 | 500 - 549     |       150
     12 | 550 - 599     |       101
     13 | 600 - 649     |        77
     14 | 650 - 699     |        58
     15 | 700 - 749     |        61
     16 | 750 - 799     |        41
     17 | 800 - 849     |        41
     18 | 850 - 899     |        33
     19 | 900 - 949     |        36
     20 | 950 - 999     |        43
        | out of bounds |       758

In brief

Using Postgres width_bucket will build buckets to gather frequency values to create histograms.
- Creating a function assigns values to predefined buckets based on a min/max range and bucket count.
- By casting, you can work with data that contains some null values
- You can create values that fall outside the defined range
By using Common Table Expressions (CTEs), you can define bucket settings dynamically with auto-adjusting bins based on the dataset.
Histograms can aid with the visualization of data and data distribution in your set. Histograms show how frequently data points appear within specific ranges (bins), making it easier to understand patterns, trends, and outliers. Bin size does affect interpretation so choosing the right number of bins is crucial; too few can oversimplify the data, while too many can create noise and obscure trends.

Build an interesting histogram? Show us @crunchydata!

Introducing Crunchy Postgres for Kubernetes 5.8: OpenTelemetry, API enhancements, UBI-9 and More

Greg.Nokes@crunchydata.com (Greg Nokes) — Thu, 03 Apr 2025 09:00:00 EDT

Today's release of Crunchy Postgres for Kubernetes, version 5.8, is a substantial update that introduces a range of features designed to revolutionize your data infrastructure and observability. Whether you are a seasoned DevOps engineer, a database administrator, or an application developer seeking a reliable Postgres environment, this version offers enhancements to streamline your workflows and enhance the overall efficiency of your deployments.

The developer-focused improvements are:

Enhanced Monitoring and Insights: OpenTelemetry integration provides faster troubleshooting and improved performance tuning.
Strengthened Security and Reliability: UBI-9 based containers enhance security and offer regular patches.
Simplified API & YAML: improved experience for Postgres configuration.

Enhanced Observability with OpenTelemetry

By embracing OpenTelemetry, Crunchy Postgres for Kubernetes 5.8 simplifies the process of setting up disparate monitoring tools, freeing up your time to focus more on important issues than shipping logs and metrics. A variety of OTel services and backends are deployed along-side Postgres for a new observability system based on OpenTelemetry. Installation, configuration and maintenance happens automatically, and aggregates all logs and metrics generated by all the components in real time. This integration leverages the robust capabilities of pgMonitor, which also enables streaming logs and metrics to other systems. This new approach enables easy setup of cross-platform monitoring and analysis, ensuring a unified view of your Postgres infrastructure.

UBI-9 Based Containers

Red Hat Universal Base Image 9 (UBI 9) containers are shipped to enhance security and reliability. UBI 9 delivers an updated security architecture with a stable, consistently patched foundation, seamless compatibility across diverse environments. An efficient update pipeline ensures you're always protected with the latest critical patches and enhancements. Choosing to switch to UBI-9 helps maintain the security and compliance of your application environment today and in the future. UBI 9 also may have better overall performance than UBI 8 due to changes in the libraries, memory usage, and files system.

Enhanced Postgres Configuration Experience

When managing a Postgres cluster, you want to be confident that the database server has been configured according to your specific needs. And should you make a configuration mistake, you want to be aware of that mistake as soon as possible. There is nothing worse than thinking your database is properly configured, only to find out later that it is not!

Thanks to two great new enhancements to the PostgresCluster API, you can be more confident than ever that custom Postgres configuration settings have been successfully applied to a Postgres cluster. The new spec.config.parameters section now provides a better way for providing custom configuration parameters, and the new spec.authentication section now provides a better way for providing custom client authentication settings. Both improve validation, while providing an intuitive user experience for settings.

Let's look at a common misconfiguration, and attempt to customize the Postgres port. While the spec.port field is the proper way to configure this setting in the PostgresCluster spec, many initially try to configure ports using the Postgres parameter. The wrong configuration is as follows:

spec:
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          port: 5555

Previously, unless you're paying close attention to the logs of your database Pod, you can easily miss that the change fails silently with the following warning:

WARNING: postgresql parameter port=5555 failed validation, defaulting to None

With the 5.8 update, now try to make the same change within the spec.config.parameters section of the PostgresCluster spec:

spec:
  config:
    parameters:
      port: 5555

When attempting to apply the change, not only is the setting rejected, but it tells you exactly how the port should be configured!

The PostgresCluster "hippo" is invalid: spec.config.parameters: Invalid value: "object": change port using .spec.port instead

This points you back to the right track to configure the Postgres port the proper way:

spec:
  port: 5555

Upgrade

Crunchy Postgres for Kubernetes builds on our experience running Postgres in Kubernetes. The additional features simplify previously complex experiences, and increase your chance of success.

See our full release notes
See our documentation for details on the upgrade process.

If you require more details or hands-on guidance, please visit Crunchy Postgres for Kubernetes documentation or contact our account team. Also, Join our Discord community forums to connect with other users and share your experiences.