Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Posts about Analytics

  • 8 min read

    Iceberg ahead! Analyzing Shipping Data in Postgres

    Marco Slot

    PostgreSQL is one of the most versatile data storage and processing tools available. We enhanced it even further by adding Iceberg tables to PostgreSQL in Crunchy Data Warehouse with a fast analytical query engine.

    What is Iceberg? Iceberg tables are stored in a compressed columnar format for fast analytics in object storage (S3). This means storage is cheap and there are no storage limits. Yet the tables are still transactional and work with nearly all PostgreSQL features. Crunchy Data Warehouse can also query or load raw data from object storage into Iceberg tables via PostgreSQL commands.

    A pattern we repeatedly see in data analytics scenarios is:

    • Use temporary or external tables to collect raw data
    • Use Iceberg as a central repository to organize data
    • Use PostgreSQL tables or materialized views for querying insights
    Read More
  • 8 min read

    Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics

    Marco Slot

    PostgreSQL is the bedrock on which many of today’s organizations are built. The versatility, reliability, performance, and extensibility of PostgreSQL make it the perfect tool for a large variety of operational workloads.

    The one area in which PostgreSQL has historically been lacking is analytics, which involves queries that summarize, filter, or transform large amounts of data. Modern analytical databases are designed to query data in data lakes in formats like Parquet

    Read More
  • 5 min read

    Easy Totals and Subtotals in Postgres with Rollup and Cube

    Elizabeth Christensen

    Postgres is being used more and more for analytical workloads. There’s a few hidden gems I recently ran across that are really handy for doing SQL for data analysis, ROLLUP and CUBE. Rollup and cube don’t get a lot of attention, but follow along with me in this post to see how they can save you a few steps and enhance your date binning

    Read More
  • 8 min read

    8 Steps in Writing Analytical SQL Queries

    Christopher Winslett

    It is never immediately obvious how to go from a simple SQL query to a complex one -- especially if it involves intricate calculations. One of the “dangers” of SQL is that you can create an executable query but return the wrong data. For example, it is easy to inflate the value of a calculated field by joining to multiple rows.

    Use Crunchy Playground to follow allow with this blog post using a Postgres terminal:

    Postgres Playground w/ Sample Data

    Read More
  • 8 min read

    4 Ways to Create Date Bins in Postgres: interval, date_trunc, extract, and to_char

    Christopher Winslett

    You followed all the best practices, your sales dates are stored in perfect timestamp format …. but now you need to get reports by day, week, quarters, and months. You need to bin, bucket, and roll up sales data in easy to view reports. Do you need a BI tool? Not yet actually. Your Postgres database has hundreds of functions that let you query data analytics by date. By using some good old fashioned SQL - you have powerful analysis and business intelligence with date details on any data set.

    In this post, I’ll walk through some of the key functions querying data by date.

    Read More
  • 4 min read

    pg_parquet: An Extension to Connect Postgres and Parquet

    Craig Kerstiens

    Today, we’re excited to release pg_parquet - an open source Postgres extension for working with Parquet files. The extension reads and writes parquet files to local disk or to S3 natively from Postgres. With pg_parquet you're able to:

    • Export tables or queries from Postgres to Parquet files
    • Ingest data from Parquet files to Postgres
    • Inspect the schema and metadata of existing Parquet files

    Code is available at: https://github.com/CrunchyData/pg_parquet/.

    Read on for more background on why we built pg_parquet or jump below to get a walkthrough of working with it.

    Read More
  • 14 min read

    Vehicle Routing with PostGIS and Overture Data

    Paul Ramsey

    The Overture Maps collection of data is enormous, encompassing over 300 million transportation segments, 2.3 billion building footprints, 53 million points of interest, and a rich collection of cartographic features as well. It is a consistent global data set, but it is intimidatingly large -- what can a person do with such a thing?

    Building cartographic products is the obvious thing, but what about the less obvious. With an analytical engine like PostgreSQL and Crunchy Bridge for Analytics, what is possible? Well turns out, a lot of things.

    Read More
  • 8 min read

    Window Functions for Data Analysis with Postgres

    Elizabeth Christensen

    SQL makes sense when it's working on a single row, or even when it's aggregating across multiple rows. But what happens when you want to compare between rows of something you've already calculated? Or make groups of data and query those? Enter window functions.

    Window functions tend to confuse people - but they’re a pretty awesome tool in SQL for data analytics. The best part is that you don’t need charts, fancy BI tools or AI to get some actionable and useful data for your stakeholders. Window functions let you quickly:

    • Calculate running totals
    • Provide summary statistics for groups/partitions of data
    • Create rankings
    • Perform lag/lead analysis, ie comparing two separate sets of data with each other
    • Compute moving/rolling averages
    Read More
  • 8 min read

    PostGIS meets DuckDB: Crunchy Bridge for Analytics goes Spatial

    Marco Slot

    Crunchy Data is excited to announce the next major feature release for Crunchy Bridge for Analytics: Geospatial Analytics.

    We have developed a variety of features to connect Postgres and PostGIS to S3 and public web servers to make spatial data access easier than ever.

    This release includes:

    • Creating an analytics table directly from a geospatial data set by providing only the URL, for ad-hoc queries and data transformations.
    • Creating a regular PostGIS table directly from a URL.
    • Automatic mapping of geospatial columns into PostGIS geometry type.
    • Support for GeoParquet, GeoJSON, Shapefile (zip), Geopackage, WKT in CSV, and more.
    • Delegate PostGIS functions and operators to DuckDB spatial for fast queries on GeoParquet.

    Together, these make Crunchy Bridge for Analytics an easy-to-use and powerful platform for working with geospatial data.

    Read More
  • 4 min read

    Postgres Materialized Views from Parquet in S3 with Zero ETL

    Marco Slot

    Data pipelines for IoT applications often involve multiple different systems. First, raw data is gathered in object storage, then several transformations happen in analytics systems, and finally results are written into transactional databases to be accessed by low latency dashboards. While a lot of interesting engineering goes into these systems, things are much simpler if you can do everything in Postgres.

    Crunchy Bridge for Analytics

    Read More