Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Featured Post

8 min read

Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics

Marco Slot

We are excited to release Crunchy Data Warehouse, a modern data warehouse for Postgres. Crunchy Data Warehouse combines Postgres with Iceberg, Parquet, and data lake formats for fast analytics queries and cost efficient storage.

Read This article
  • 6 min read

    Crunchy Data Warehouse: Postgres with Iceberg Available for Kubernetes and On-premises

    Craig Kerstiens

    Today I'm excited to announce the release of Crunchy Data Warehouse on premises, which provides one of the easiest and yet richest ways to work with your data lake in the environment of your choosing. Built on top of Crunchy Postgres for Kubernetes, Crunchy Data Warehouse extends Postgres with a modern data warehouse solution, giving you: • The ability to easily query data where it resides in S3 or S3 compatible storage (like MinIO). With a variety of data formats supported including CSV, JSO...

    Read More
  • 5 min read

    Reducing Cloud Spend: Migrating Logs from CloudWatch to Iceberg with Postgres

    Craig Kerstiens

    As a database service provider, we store a number of logs internally to audit and oversee what is happening within our systems. When we started out, the volume of these logs is predictably low, but with scale they grew rapidly. Given the number of databases we run for users on Crunchy Bridge, the volume of these logs has grown to a sizable amount. Until last week, we retained those logs in AWS CloudWatch. Spoiler alert: this is expensive. While we have a number of strategies to drive efficiency...

    Read More
  • 3 min read

    Postgres Security Checklist from the Center for Internet Security

    Elizabeth ChristensenDoug Hunley

    The Center for Internet Security (CIS) releases security benchmarks to cover a wide variety of infrastructure used in modern applications, including databases, operating systems, cloud services, containerized services, and even networking. Since 2016 Crunchy Data has collaborated with CIS to provide this security resource for those deploying Postgres. The output of this collaboration is a checklist for folks to follow and improve the security posture of Postgres deployments. The PostgreSQL CIS...

    Read More
  • 5 min read

    Automatic Iceberg Maintenance Within Postgres

    Önder Kalacı

    Today we're excited to announce built-in maintenance for Iceberg in Crunchy Data Warehouse . This enhancement to Crunchy Data Warehouse brings PostgreSQL-style maintenance directly to Iceberg. The warehouse autovacuum workers continuously optimize Iceberg tables by compacting data and cleaning up expired files. In this post, we'll explore how we handle cleanup, and in the follow-up posts, we'll take a deeper dive into compaction. If you use Postgres, you are probably familiar with tables and ro...

    Read More
  • Postgres Troubleshooting: Fixing Duplicate Primary Key Rows

    Greg Sabino Mullane

    Someone recently asked on the Postgres mailing lists about how to remove unwanted duplicate rows from their table. They are “unwanted” in that sense that the same value appears more than once in a column designated as a primary key. We’ve seen an uptick in this problem since glibc was kind enough to change the way they sorted things. This can lead to invalid indexes when one upgrades their OS and modifies the underlying glibc library. One of the main effects of a corrupted unique index is allowi...

    Read More
  • 9 min read

    Citus: The Misunderstood Postgres Extension

    Craig Kerstiens

    Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that's a mouthful), but it does so for very specific purposes. Citus, in general, is fit for these type of applications and only the...

    Read More
  • 2 min read

    Pi Day PostGIS Circles

    Paul Ramsey

    What's your favourite infinite sequence of non-repeating digits? There are some people who make a case for e , but to my mind nothing beats the transcendental and curvy utility of π, the ratio of a circle's circumference to its diameter. Drawing circles is a simple thing to do in PostGIS -- take a point, and buffer it. The result is circular, and we can calculate an estimate of pi just by measuring the perimeter of the unit circle. Except, look a little more closely -- this "circle" seems to...

    Read More
  • 7 min read

    Postgres, dbt, and Iceberg: Scalable Data Transformation

    Aykut Bozkurt

    Seamless integration of dbt with Crunchy Data Warehouse automates data movement between Postgres and Apache Iceberg. dbt’s modular SQL approach, combined with Iceberg’s scalable storage, and Postgres’ query engine means you can build fast, efficient, and reliable analytics—with minimal complexity. Today let’s dig into an example of using dbt with Postgres and Iceberg. The steps will be: 1. Set up Iceberg tables in Crunchy Data Warehouse using real-world real-time data from GitHub events 2. Confi...

    Read More
  • 4 min read

    Validating Data Types from Semi-Structured Data Loads in Postgres with pg_input_is_valid

    Elizabeth Christensen

    Working on big data loads and or data type changes can be tricky - especially finding and correcting individual errors across a large data set. Postgres versions, 16, 17, and newer have a new function to help with data validation: . is a sql function that can be queried that will determine if a given input can be parsed into a specific type like numeric, date, JSON, etc. Here’s a super basic query to ask if ‘123’ is a valid integer. This function gives a t-true and f-false response. So if I as...

    Read More
  • 7 min read

    Incremental Archival from Postgres to Parquet for Analytics

    Marco Slot

    PostgreSQL is commonly used to store event data coming from various kinds of devices. The data often arrives as individual events or small batches, which requires an operational database to capture. Features like time partitioning help optimize the storage layout for time range filtering and efficient deletion of old data. The PostgreSQL feature set gives you a lot of flexibility for handling a variety of IoT scenarios, but there are certain scenarios for it is less suitable, namely: • Long-te...

    Read More
  • Postgres Parallel Query Troubleshooting

    Brian Pace

    Postgres' ability to execute queries in parallel is a powerful feature that can significantly improve query performance, especially on large datasets. However, like all resources, parallel workers are finite. When there aren't enough available workers, Postgres may downgrade a parallel query to a serial (non-parallel) execution. This sounds reasonable unless the performance of the downgraded query is well beyond the required response times needed by the application. While helping our clients w...

    Read More