Posts about Analytics

Partitioning Analytics
5 min read
Craig KerstiensMay 21, 2025
Archive Postgres Partitions to Iceberg
Craig KerstiensMay 21, 2025
Postgres comes with built-in partitioning and you can also layer in for for additional help with maintenance of your partitioning. It works quite well for partitioning your data to make it easy to retain a limited set of data and improve performance if your primary workload is querying a small time series focused subset of data. Oftentimes, when implementing partitioning you only keep a portion of your data then drop older data as it ages out for cost management. But what if we could move ol...
Read More
Analytics
3 min read
Aykut BozkurtMay 7, 2025
Announcing pg_parquet v.0.4.0: Google Cloud Storage, https storage, and more
Aykut BozkurtMay 7, 2025
What began as a hobby Rust project to explore the PostgreSQL extension ecosystem and the Parquet file format has grown into a handy component for folks integrating Postgres and Parquet into their data architecture. Today, we’re excited to release version 0.4 of pg_parquet . This release includes: • COPY TO/FROM Google Cloud Storage • COPY TO/FROM http(s) stores • COPY TO/FROM stdin/stdout with (FORMAT PARQUET) • Support Parquet UUID, JSON, JSONB types COPY TO/FROM Google Cloud Storage COPY TO/F...
Read More
Production Postgres Analytics
4 min read
Marco SlotApr 22, 2025
Logical replication from Postgres to Iceberg
Marco SlotApr 22, 2025
Operational and analytical workloads have historically been handled by separate database systems, though they are starting to converge. We built Crunchy Data Warehouse to put PostgreSQL at the frontier of analytics systems, using modern technologies like Iceberg and a hybrid query engine . Combining operational and analytical capabilities is extremely useful, but it is not meant to drive all your workloads into a single system. In most organizations, application developers and analysts work...
Read More
Analytics
10 min read
Elizabeth ChristensenChristopher WinslettApr 4, 2025
Creating Histograms with Postgres
Elizabeth ChristensenChristopher WinslettApr 4, 2025
Histograms were first used in a lecture in 1892 by Karl Pearson — the godfather of mathematical statistics. With how many data presentation tools we have today, it’s hard to think that representing data as a graphic was classified as “innovation”, but it was. They are a graphic presentation of the distribution and frequency of data. If you haven’t seen one recently, or don’t know the word histogram off the top of your head - it is a bar chart, each bar represents the count of data with a defined...
Read More
Analytics Crunchy Data Warehouse
5 min read
Craig KerstiensMar 26, 2025
Reducing Cloud Spend: Migrating Logs from CloudWatch to Iceberg with Postgres
Craig KerstiensMar 26, 2025
As a database service provider, we store a number of logs internally to audit and oversee what is happening within our systems. When we started out, the volume of these logs is predictably low, but with scale they grew rapidly. Given the number of databases we run for users on Crunchy Bridge, the volume of these logs has grown to a sizable amount. Until last week, we retained those logs in AWS CloudWatch. Spoiler alert: this is expensive. While we have a number of strategies to drive efficiency...
Read More
Analytics Crunchy Data Warehouse
9 min read
Craig KerstiensMar 18, 2025
Citus: The Misunderstood Postgres Extension
Craig KerstiensMar 18, 2025
Citus is in a small class of the most advanced Postgres extensions that exist. While there are many Postgres extensions out there, few have as many hooks into Postgres or change the storage and query behavior in such a dramatic way. Most that come to Citus have very wrong assumptions. Citus turns Postgres into a sharded, distributed, horizontally scalable database (that's a mouthful), but it does so for very specific purposes. Citus, in general, is fit for these type of applications and only the...
Read More
Analytics Crunchy Data Warehouse
7 min read
Aykut BozkurtMar 11, 2025
Postgres, dbt, and Iceberg: Scalable Data Transformation
Aykut BozkurtMar 11, 2025
Seamless integration of dbt with Crunchy Data Warehouse automates data movement between Postgres and Apache Iceberg. dbt’s modular SQL approach, combined with Iceberg’s scalable storage, and Postgres’ query engine means you can build fast, efficient, and reliable analytics—with minimal complexity. Today let’s dig into an example of using dbt with Postgres and Iceberg. The steps will be: 1. Set up Iceberg tables in Crunchy Data Warehouse using real-world real-time data from GitHub events 2. Confi...
Read More
Production Postgres Analytics
5 min read
Elizabeth ChristensenFeb 3, 2025
Indexing Materialized Views in Postgres
Elizabeth ChristensenFeb 3, 2025
Materialized views are widely used in Postgres today. Many of us are working with using connected systems through foreign data wrappers, separate analytics systems like data warehouses , and merging data from different locations with Postgres queries. Materialized views let you precompile a query or partial table, for both local and remote data. Materialized views are static and have to be refreshed. One of the things that can be really important for using materialized views efficiently is inde...
Read More
Analytics
19 min read
Karen JexJan 9, 2025
Postgres Tuning & Performance for Analytics Data
Karen JexJan 9, 2025
Your database is configured for the needs of your day-to-day OLTP (online transaction processing) application workload, but what if you need to run analytics queries against your application data? How can you do that without compromising the performance of your application? Application data gradually builds up in your database over time, and at some point the business wants to glean insights from it by running analytics queries. Analytics activity, sometimes called OLAP (online analytical proces...
Read More
Analytics
11 min read
Marco SlotDec 17, 2024
pg_incremental: Incremental Data Processing in Postgres
Marco SlotDec 17, 2024
Today I’m excited to introduce pg_incremental , a new open source PostgreSQL extension for automated, incremental, reliable batch processing. This extension helps you create processing pipelines for append-only streams of data, such as IoT / time series / event data workloads. Notable pg_incremental use cases include: • Creation and incremental maintenance of rollups, aggregations, and interval aggregations • Incremental data transformations • Periodic imports or export of new data using standa...
Read More

1 2