Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

  • Choosing a PostgreSQL Number Format

    Paul Ramsey

    It should be the easiest thing in the world: you are modeling your data and you need a column for some numbers, what type do you use?

    PostgreSQL offers a lot of different number types, and they all have advantages and limitations. You want the number type that is going to:

    • Store your data using the smallest amount of space
    • Represent your data with the smallest amount of error
    • Manipulate your data using the correct logic
    Read More
  • Data To Go: Postgres Logical Replication

    Elizabeth Christensen

    Logical replication has been around since Postgres 10 and it's just one of the most useful things out there. Need to get your data to a business analytics engine? Want to create an ETL pipeline from Postgres to your data warehouse? Want to test a new version of Postgres locally with your data? Want to consolidate a few databases into one? Logical replication can do all those things!

    It is a replication tool, which means it will copy some or all of your data to a new location and continue updating data from the primary source. This allows you to have accurate copies of your data in different locations in perpetuity.

    There are two types of replication in Postgres, so just at a super high level some notes for you:

    • Streaming replication

    Read More
  • Postgres Locking: When is it Concerning?

    David Christensen

    When using monitoring tools like PgMonitor or pganalyze, Crunchy clients will often ask me about high numbers of locks and when to worry. Like most engineering-related questions, the answer is: "it depends".

    In this post, I will provide a little more information about locks, how they are used in PostgreSQL, and what things to look for to spot problems vs high usage.

    PostgreSQL uses locks in all parts of its operation to serialize or share access to key data. This can come in the form of two basic types of locks: shared or exclusive.

    • Shared locks

    Read More
  • 14 min read

    Snap Clones with the Postgres Operator on Kubernetes

    Brian Pace

    Getting frequent copies of data for development and testing purposes is a very common use case with our enterprise customer base. We have customers getting data copes for each sprint or development cycle. This increasing data copy problem can put a strain on IT budgets with the storage consumed and the hours spent performing database refreshes.

    A common process is to build or refresh environments using database restores. There can be some challenges with this approach. First, the data must be moved from the backup location to newly provisioned storage. Moving large amounts of data around is time consuming and expensive. Another process is to perform storage snapshots of the Postgres data volumes. The risk with this approach is the snapshot could be corrupted or unusable.

    Customers using the Crunchy Postgres for Kubernetes

    Read More
  • 6 min read

    Postgres' Clever Query Planning System

    Paul Ramsey

    The sheer cleverness of relational databases is often discounted because we so frequently use them for very simple data management tasks.

    Serialize an object into a row, store with unique key. yawwwn

    Search for unique key, deserialize row into an object. yawwwwwwn

    The real power of relational databases is juggling "relations" (aka tables) in large numbers and figuring out on-the-fly the most effective way to filter out rows and find an answer.

    diagram of query plan from payment to append to sort to merge left join

    PostgreSQL has an undeniably clever query planning system that auto-tunes based on the data in the system. It samples tables to gain statistics about the distribution of data, and uses those statistics to choose the order of joins and filters applied to the data for the most efficient query execution.

    Even more amazing, the query planning system is modular enough to integrate user-defined data types, like the geometry

    Read More
  • 9 min read

    Quick and Easy Postgres Data Compare

    Brian Pace

    If you're checking archives or working with Postgres replication, data reconciliation can be a necessary task. Row counts can be one of the go to comparison methods but that does not show data mismatches. You could pull table data across the network and then compare each row and each field, but that can be a demand on resources. Today we'll walk through a simple solution for your Postgres toolbox - using Foreign Data Wrappers to connect and compare the two source datasets. With the foreign data wrapper and a little sql magic, we can compare data quickly and easily.

    Creating Environments

    Read More
  • Phases of Database Growth and Cost

    Christopher Winslett

    TL;DR: keep your data-layer simple, you'll need the agility it offers when budgets get stretched.

    Over the past 12 years of helping people run databases for their businesses, I have discovered that there are only 2 financial phases in the lifecycle of a database:

    • Phase 1: Absolute costs are low, thus marginal costs do not matter.
    • Phase 2
    Read More
  • 5 min read

    Effective PostgreSQL Cluster Configuration & Management Using PGO v5.1

    Andrew L'Ecuyer

    Modern day production ready Postgres solutions require quite a bit of sophistication and automation. Changes need to be applied in a uniform and safe way. DevOps and SRE teams need to be in control system updates while limiting disruption to their users.

    With the release of PGO v5.1

    Read More
  • Exporter2Perfdata - Connecting Postgres Monitoring Tools

    Yogesh Sharma

    pgMonitor is a popular open source monitoring tool based on Prometheus, Exporters, and Grafana. We have many customers leveraging pgMonitor when using the Crunchy Postgres for Kubernetes

    Read More
  • Postgres Query Optimization: LEFT JOIN vs UNION ALL

    David Christensen

    Introduction

    The PostgreSQL optimizer is an amazing thing, getting only more amazing with each release. It is able to take information about your data definitions, your data distribution, constraints, and the specific queries and come up with the generally most efficient way to return the results of that query.

    Since SQL is a declarative language, we're explicitly giving up defining how

    Read More