Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more

Posts about Production Postgres

  • Indexing Materialized Views in Postgres

    Elizabeth Christensen

    Materialized views are widely used in Postgres today. Many of us are working with using connected systems through foreign data wrappers, separate analytics systems like data warehouses, and merging data from different locations with Postgres queries. Materialized views let you precompile a query or partial table, for both local and remote data. Materialized views are static and have to be refreshed.

    One of the things that can be really important for using materialized views efficiently is indexing.

    Adding indexes to Postgres in general is critical for operation and query performance. Adding indexes for materialized views is also generally recommended for a few different reasons.

    • Materialized views are typically used for larger data sets where queries or large joins benefit from being precompiled and indexes add an additional performance boost.
    • Even if an underlying table data has indexes, those indexes will not be used in a materialized view. The materialized view is saved on disk separately, so it needs a stand alone index.
    • When querying a materialized view, Postgres treats it like a regular table. There’s no special query planner for materialized views. If it is the type of data that would benefit from an index if it was table, the materialized view would benefit from an index.
    Read More
  • When Does ALTER TABLE Require a Rewrite?

    Greg Sabino Mullane

    It is rare that a Postgres table keeps the exact same structure year after year. New columns get added. Old columns get dropped. Column data types need to change. Those are all done with the ALTER TABLE command. One big drawback to these changes is that they may force a complete table rewrite. A rewrite means a completely new copy of the table is created, and then the old one is dropped. This can take a very long time for large tables. Worse, everything else is blocked/locked from using the table, so your application may need downtime.

    So which commands need a full table rewrite? Which only needs a split-second lock? The alter table documentation

    Read More
  • Running an Async Web Query Queue with Procedures and pg_cron

    Paul Ramsey

    The number of cool things you can do with the http extension is large, but putting those things into production raises an important problem.

    The amount of time an HTTP request takes, 100s of milliseconds, is 10- to 20-times longer that the amount of time a normal database query takes.

    Read More
  • Understanding the Postgres Hackers Mailing List Language

    Greg Sabino Mullane

    The Postgres hackers mailing list (pgsql-hackers@postgresql.org) is an invaluable resource for anyone wanting to contribute to the PostgreSQL code. The Postgres project does not use PRs (pull requests) or GitHub issues. So if you want to contribute an idea, or help with code reviews, the hackers mailing list is the canonical way to do so. More information on contributing is on the Postgres wiki at: https://wiki.postgresql.org/wiki/So,_you_want_to_be_a_developer

    Read More
  • Announcing an Open Source Monitoring Extension for Postgres with pgMonitor

    Keith Fiske

    Crunchy Data is pleased to announce a new open source pgMonitor Extension. Crunchy Data has worked on a pgMonitor tool for several years as part of our Kubernetes

    Read More
  • Postgres Troubleshooting - DiskFull ERROR could not resize shared memory segment

    Jesse Soyland

    There’s a couple super common Postgres errors you’re likely to encounter while using this database, especially with an application or ORM. One is the PG::DiskFull: ERROR: could not resize shared memory segment. It will look something like this.

    "PG::DiskFull: ERROR: could not resize shared memory segment "/PostgreSQL.938232807" to 55334241 bytes: No space left on device"
    
    Read More
  • Parallel Queries in Postgres

    Elizabeth Christensen

    Many folks are surprised to hear that Postgres has parallel queries out of the box. This was released in small batches across a half dozen versions of Postgres, so the major fanfare for having parallelism got a little bit lost.

    By default Postgres is configured for two parallel workers. The Postgres query planner will assemble a few plans for any given query and will estimate the additional overhead of performing parallel queries, and make a go or no-go decision. Depending on the settings and the calculations of the query planner, parallel queries are typically used by large and long running queries — like warehouse or analytical workloads.

    A simplified diagram of the process for how Postgres decides to use the parallel query capabilities.

    Below is the output of a sample EXPLAIN

    Read More
  • Introducing pgCompare: The Ultimate Multi-Database Data Comparison Tool

    Brian Pace

    In the evolving world of data management, ensuring consistency and accuracy across multiple database systems is paramount. Whether you're migrating data, synchronizing systems, or performing routine audits, the ability to compare data across different database platforms is crucial. Enter pgCompare

    Read More
  • Data Encryption in Postgres: A Guidebook

    Greg Nokes

    When your company has decided it's time to invest in more open source, Postgres is the obvious choice. Managing databases is not new and you already have established practices and requirements for rolling out a new database. One of the big requirements we frequently help new customers with on their Postgres adoption is data encryption. While the question is simple, there's a few layers to it that determine which is the right approach for you. Here we'll walk through the pros and cons of approaches and help you identify the right path for your needs.

    Overview of At-Rest Encryption Methods

    Read More
  • Building PostgreSQL Extensions: Dropping Extensions and Cleanup

    David Christensen

    I recently created a Postgres extension which utilizes the pg_cron extension to schedule recurring activities using the cron.schedule(). Everything worked great. The only problem was when I dropped my extension, it left the cron job scheduled, which resulted in regular errors:

    2024-04-06 16:00:00.026 EST [1548187] LOG:  cron job 2 starting: SELECT bridge_stats.update_stats('55 minutes', false)
    2024-04-06 16:00:00.047 EST [1580698] ERROR:  schema "bridge_stats" does not exist at character 8
    2024-04-06 16:00:00.047 EST [1580698] STATEMENT:  SELECT bridge_stats.update_stats('55 minutes', false)
    
    Read More