• Automatic Iceberg Maintenance Within Postgres

    Önder Kalacı

    Today we're excited to announce built-in maintenance for Iceberg in Crunchy Data Warehouse . This enhancement to Crunchy Data Warehouse brings PostgreSQL-style maintenance directly to Iceberg. The warehouse autovacuum workers continuously optimize Iceberg tables by compacting data and cleaning up expired files. In this post, we'll explore how we handle cleanup, and in the follow-up posts, we'll take a deeper dive into compaction. If you use Postgres, you are probably familiar with tables and ro...

    Read More
  • Postgres Troubleshooting: Fixing Duplicate Primary Key Rows

    Greg Sabino Mullane

    Someone recently asked on the Postgres mailing lists about how to remove unwanted duplicate rows from their table. They are “unwanted” in that sense that the same value appears more than once in a column designated as a primary key. We’ve seen an uptick in this problem since glibc was kind enough to change the way they sorted things. This can lead to invalid indexes when one upgrades their OS and modifies the underlying glibc library. One of the main effects of a corrupted unique index is allowi...

    Read More
  • Citus: The Misunderstood Postgres Extension

    Craig Kerstiens

    Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that's a mouthful), but it does so for very specific purposes. Citus, in general, is fit for these type of applications and only the...

    Read More
  • 2 min read

    Pi Day PostGIS Circles

    Paul Ramsey

    What's your favourite infinite sequence of non-repeating digits? There are some people who make a case for e , but to my mind nothing beats the transcendental and curvy utility of π, the ratio of a circle's circumference to its diameter. Drawing circles is a simple thing to do in PostGIS -- take a point, and buffer it. The result is circular, and we can calculate an estimate of pi just by measuring the perimeter of the unit circle. Except, look a little more closely -- this "circle" seems to...

    Read More
  • Postgres, dbt, and Iceberg: Scalable Data Transformation

    Aykut Bozkurt

    Seamless integration of dbt with Crunchy Data Warehouse automates data movement between Postgres and Apache Iceberg. dbt’s modular SQL approach, combined with Iceberg’s scalable storage, and Postgres’ query engine means you can build fast, efficient, and reliable analytics—with minimal complexity. Today let’s dig into an example of using dbt with Postgres and Iceberg. The steps will be: 1. Set up Iceberg tables in Crunchy Data Warehouse using real-world real-time data from GitHub events 2. Confi...

    Read More
  • 4 min read

    Validating Data Types from Semi-Structured Data Loads in Postgres with pg_input_is_valid

    Elizabeth Christensen

    Working on big data loads and or data type changes can be tricky - especially finding and correcting individual errors across a large data set. Postgres versions, 16, 17, and newer have a new function to help with data validation: . is a sql function that can be queried that will determine if a given input can be parsed into a specific type like numeric, date, JSON, etc. Here’s a super basic query to ask if ‘123’ is a valid integer. This function gives a t-true and f-false response. So if I as...

    Read More
  • Incremental Archival from Postgres to Parquet for Analytics

    Marco Slot

    PostgreSQL is commonly used to store event data coming from various kinds of devices. The data often arrives as individual events or small batches, which requires an operational database to capture. Features like time partitioning help optimize the storage layout for time range filtering and efficient deletion of old data. The PostgreSQL feature set gives you a lot of flexibility for handling a variety of IoT scenarios, but there are certain scenarios for it is less suitable, namely: • Long-te...

    Read More
  • Postgres Parallel Query Troubleshooting

    Brian Pace

    Postgres' ability to execute queries in parallel is a powerful feature that can significantly improve query performance, especially on large datasets. However, like all resources, parallel workers are finite. When there aren't enough available workers, Postgres may downgrade a parallel query to a serial (non-parallel) execution. This sounds reasonable unless the performance of the downgraded query is well beyond the required response times needed by the application. While helping our clients w...

    Read More
  • 10 min read

    Using Cloud Rasters with PostGIS

    Paul Ramsey

    With the extension, it is possible to access gigabytes of raster data from the cloud, without ever downloading the data . How? The venerable extension (released 13 years ago ) already has the critical core support built-in! Rasters can be stored inside the database, or outside the database, on a local file system or anywhere it can be accessed by the underlying GDAL raster support library. The storage options include S3, Azure, Google, Alibaba, and any HTTP server that supports RANG...

    Read More
  • Indexing Materialized Views in Postgres

    Elizabeth Christensen

    Materialized views are widely used in Postgres today. Many of us are working with using connected systems through foreign data wrappers, separate analytics systems like data warehouses , and merging data from different locations with Postgres queries. Materialized views let you precompile a query or partial table, for both local and remote data. Materialized views are static and have to be refreshed. One of the things that can be really important for using materialized views efficiently is inde...

    Read More