<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" version="2.0"><channel><title>Elizabeth Christensen | CrunchyData Blog</title>
<atom:link href="https://www.crunchydata.com/blog/author/elizabeth-christensen/rss.xml" rel="self" type="application/rss+xml" />
<link>https://www.crunchydata.com/blog/author/elizabeth-christensen</link>
<image><url>https://www.crunchydata.com/build/_assets/elizabeth-christensen.png-W6WDTQFB.webp</url>
<title>Elizabeth Christensen | CrunchyData Blog</title>
<link>https://www.crunchydata.com/blog/author/elizabeth-christensen</link>
<width>3016</width>
<height>3287</height></image>
<description>PostgreSQL experts from Crunchy Data share advice, performance tips, and guides on successfully running PostgreSQL and Kubernetes solutions</description>
<language>en-us</language>
<pubDate>Tue, 20 Jan 2026 08:00:00 EST</pubDate>
<dc:date>2026-01-20T13:00:00.000Z</dc:date>
<dc:language>en-us</dc:language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<item><title><![CDATA[ Postgres Serials Should be BIGINT (and How to Migrate) ]]></title>
<link>https://www.crunchydata.com/blog/postgres-serials-should-be-bigint-and-how-to-migrate</link>
<description><![CDATA[ Postgres 18 defaults to checksums on. This is a good feature for data integrity but might catch you off guard with an upgrade.  ]]></description>
<content:encoded><![CDATA[ <p>Lots of us started with a Postgres database that incremented with an id <code>SERIAL PRIMARY KEY</code>. This was the Postgres standard for many years for data columns that auto incremented. The SERIAL is a shorthand for an integer data type that is automatically incremented. However as your data grows in size, <code>SERIAL</code>s and <code>INT</code>s can run the risk of an integer overflow as they get closer to 2 Billion uses.<p>We covered a lot of this in a blog post <a href=https://www.crunchydata.com/blog/the-integer-at-the-end-of-the-universe-integer-overflow-in-postgres><em>The Integer at the End of the Universe: Integer Overflow in Postgres</em></a> a few years ago. Since that was published we’ve helped a number of customers with this problem and I wanted to refresh the ideas and include some troubleshooting steps that can be helpful. I also think that <code>BIGINT</code> is more cost effective than folks realize.<p><code>SERIAL</code> and <code>BIGSERIAL</code> are just shorthands and map directly to the <code>INT</code> and <code>BIGINT</code> data types. While something like <code>CREATE TABLE user_events (id SERIAL PRIMARY KEY)</code> would have been common in the past, the best practice now is <code>BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY</code> is recommended. <code>SERIAL</code>/ <code>BIGSERIAL</code> are not SQL standard and the <code>GENERATED ALWAYS</code> keyword prevents accidental inserts, guaranteeing the database manages the sequence instead of a manual or application based addition.<ul><li><code>INT</code> - goes up to 2.1 Billion (2,147,483,647) and more if you do negative numbers. INT takes up 4 bytes per row column.<li><code>BIGINT</code>- goes up 9.22 quintillion (9,223,372,036,854,775,807) and needs a 8-bytes for storage.</ul><p><strong>Serials vs UUID</strong><p>Before I continue talking about serials in Postgres, it is worth noting that Postgres also has robust <a href=https://www.crunchydata.com/blog/get-excited-about-postgres-18#uuid-v7>UUID support</a>, including v7 which was just released. If you decide to go with <code>UUID</code>, great. This makes a ton of sense for things that can be URLs or are across systems. However not all ids need to be UUIDs, so lots of folks still continue with a serialized / incremented integers.<h2 id=cost-difference-between-int-and-bigint><a href=#cost-difference-between-int-and-bigint>Cost difference between INT and BIGINT</a></h2><p>Postgres does not pack data tightly like a text file. It writes data in aligned tuples /  rows, and standard 64-bit servers require data to line up on 8-byte boundaries. In many table layouts, <code>INT</code> and <code>BIGINT</code> consume the exact same amount of disk space. The "savings" of <code>INT</code> are often eaten by empty padding bytes.<p>Think of this sample table:<p><code>INT</code><ul><li>Header: 24 bytes (Standard row overhead)<li>Data: 4 bytes (INT)<li>Padding: PostgreSQL adds 4 empty bytes to fill the gap so the next row starts on an 8-byte boundary.<li>Total per Row: $24 + 4 + 4 = 32</ul><p><code>BIGINT</code><ul><li>Header: 24 bytes (Standard row overhead)<li>Data: 8 bytes (BIGINT)<li>Padding: 0 bytes (Already perfectly aligned to 8 bytes).<li>Total per Row: $24 + 8 + 0 = 32</ul><p>You pay $0.00 extra for using <code>BIGINT</code>.<p>Even in the scenario where your specific column order does result in a true 4-byte increase per row for <code>BIGINT</code>, the costs are negligible. Let’s say you have 4 extra bytes per row for a billion rows, that’s just ~4 GB. On Crunchy Bridge that’s about .<strong>40 cents a month</strong> (similar on other modern clouds).<p>Using <code>BIGINT</code> instead of <code>INT</code> for a database bound for production sequencing is probably the safer bet if you’re logging anything like timestamps, page hits, or things that will be incrementing to the millions or billions. Avoiding the man hours and cost to do an in-place data type change of this nature is worth it.<h2 id=live-data-type-change-in-postgres---the-atomic-swap><a href=#live-data-type-change-in-postgres---the-atomic-swap>Live data type change in Postgres - the atomic swap</a></h2><p>Ok, let’s say I’ve convinced you to move to <code>BIGINT</code> now. Maybe you’re close to integer wraparound or maybe you’re small enough that you can do this now before it becomes a bigger headache.<p>Changing a production data column type is always tricky business. The data type change needs to be done across millions and billions of rows in production, but:<ul><li>We can’t lock the table<li>We don’t want to take downtime<li>We need to preserve the current increments</ul><p>Luckily our support team helps folks often with these types of changes and with this blog I’ve collected notes and helpful tips over dozens of these projects for this blog post.<p>The foundational strategy for this migration is to perform the bulk of the work asynchronously—while the application remains online—by creating a new <code>BIGINT</code> column, backfilling the data, and then performing a quick, single-transaction switchover. We like to call this changeover an atomic swap. Atomic swap is a specific technique used to switch a live table with a new version of itself without taking the application offline<p>Here is the high-level plan:<ol><li>Add a new <code>BIGINT</code> column, sequence, and a unique index. Backfill the old id values into the new column in batches.<li>Changeover (Brief Downtime)<strong>:</strong> Lock the table, complete the final backfill, drop old constraints, rename columns (<code>id</code> to <code>id_old</code>, <code>id_new</code> to <code>id</code>), and add a non-validated <code>NOT NULL</code> constraint.<li>Validate the <code>NOT NULL</code> constraint, promote the column to a Primary Key, and clean up.</ol><h3 id=set-up-the-test-environment><a href=#set-up-the-test-environment>Set Up the Test Environment</a></h3><p>I’ll provide some sample code for doing a full <code>INT</code> to <code>BIGINT</code> changever. This will make more sense with a sample table that mimics a real-world scenario where the <code>SERIAL</code> primary key is the bottleneck. I’ve also added steps for a foreign key constraint because we see this frequently.<pre><code class=language-sql>-- 1. Create the Parent Table (Standard SERIAL / INT)
CREATE TABLE user_events (
    id SERIAL PRIMARY KEY,
    data TEXT,
    created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);

-- 2. Create a Child Table (Foreign Key Dependency)
CREATE TABLE user_events_log (
    log_id SERIAL PRIMARY KEY,
    event_id INTEGER NOT NULL,
    log_message TEXT,
    created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
    CONSTRAINT fk_user_events
        FOREIGN KEY (event_id)
        REFERENCES user_events (id)
);

-- 3. Populate Initial Data (100k rows)
INSERT INTO user_events (data, created_at)
SELECT 'Historical Data', NOW() - (random() * (interval '90 days'))
FROM generate_series(1, 100000);

INSERT INTO user_events_log (event_id, log_message)
SELECT id, 'Log entry for event ' || id
FROM user_events;

-- We start inserting rows in the background to prove the migration is "Online".  (You may need to configure pg_cron in your environment for this to work.)
CREATE EXTENSION IF NOT EXISTS pg_cron;

SELECT cron.schedule(
    'generate-events-traffic',
    '2 seconds', -- Runs every 2 seconds
    $$
    INSERT INTO user_events (data, created_at)
    SELECT 'Live Incoming Traffic', NOW()
    FROM generate_series(1, 1000);
    $$
);
</code></pre><h3 id=add-the-new-bigint-columns><a href=#add-the-new-bigint-columns>Add the New BIGINT Columns</a></h3><p>We add the column allowing NULLs. Later when we create the primary key index, NULLs will not be allowed. This is a quick metadata change even to a large table. It does take a short lock on the table, but only for a tiny blip because we’re creating a new column.<pre><code class=language-sql>ALTER TABLE user_events ADD COLUMN id_new BIGINT;
ALTER TABLE user_events_log ADD COLUMN event_id_new BIGINT;
</code></pre><p>If you’re doing the full test, a trigger ensures any new rows inserted into this table will get their 'id_new' field populated automatically.<pre><code class=language-sql>CREATE OR REPLACE FUNCTION sync_id_new()
RETURNS TRIGGER AS $$
BEGIN
    NEW.id_new := NEW.id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_sync_id_new
BEFORE INSERT ON user_events
FOR EACH ROW
EXECUTE FUNCTION sync_id_new();
</code></pre><h3 id=backfill-in-batches><a href=#backfill-in-batches>Backfill in batches</a></h3><p>Now we can backfill the new column from the old one. We’ll do this in batches to avoid a massive transaction that could cause replication lag or I/O spikes.  We write this as a PROCEDURE to allow us to specify the batch size and sleep time, also allowing it to COMMIT between batches.<pre><code class=language-sql>CREATE OR REPLACE PROCEDURE backfill_id_new(batch_size INTEGER, sleep_time FLOAT)
AS $$
DECLARE
    rows_updated BIGINT := 0;
    max_id_to_process BIGINT;
BEGIN
    -- We define a "high water mark" so we don't chase the moving target forever.
    -- We know any rows higher than this value will not need to be backfilled.

    SELECT MAX(id) INTO max_id_to_process FROM user_events;

    LOOP
        WITH rows_to_update AS (
            SELECT id
            FROM user_events
            WHERE id_new IS NULL
            AND id &#60= max_id_to_process
            LIMIT batch_size
            FOR UPDATE SKIP LOCKED
        )
        UPDATE user_events m
        SET id_new = r.id
        FROM rows_to_update r
        WHERE m.id = r.id;

        GET DIAGNOSTICS rows_updated = ROW_COUNT;

        COMMIT;

        EXIT WHEN rows_updated = 0;

        PERFORM pg_sleep(sleep_time);
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Run Backfill on the main table
CALL backfill_id_new(1000, 0.5);

-- Backfill Child Table (We are using a simple update for our test script, in
-- practice you would use the same batching approach in prod)
UPDATE user_events_log SET event_id_new = event_id;
</code></pre><h3 id=batch-vacuum><a href=#batch-vacuum>Batch vacuum</a></h3><p>For a small test like this one, you don’t need to vacuum but as we’ve found with larger production moves, regularly running <code>VACUUM</code> is crucial during the backfill process to clean up dead rows created by the <code>UPDATE</code> statements. The old rows need to be cleaned up since we have new rows that have both INT and BIGINT and this cleanup prevents table bloat.<pre><code class=language-sql>-- Run this command after every 5-10 backfill batches (e.g., every 500,000 rows)
VACUUM (ANALYZE, VERBOSE) user_events;
VACUUM (ANALYZE, VERBOSE) user_events_log;
</code></pre><p>You may need to play around with your batch size. Instead of using very large batch sizes (e.g., 4 million rows), stick to a smaller, efficient size (like 100,000 rows). The overall time for a smaller batch <em>plus</em> a vacuum proved to be more efficient than a single massive batch followed by a prolonged vacuum.<h3 id=create-an-index-concurrently><a href=#create-an-index-concurrently>Create an index concurrently</a></h3><p>We can now create the necessary unique index, which will eventually enforce the primary key constraint. Using <code>CONCURRENTLY</code> is the key to maintaining uptime.<pre><code class=language-sql>-- 1. Ensure all backfilled rows are NOT NULL for the index creation
ALTER TABLE user_events
ALTER COLUMN id_new SET NOT NULL;

-- 2. Create the unique index CONCURRENTLY (non-locking DML)
CREATE UNIQUE INDEX CONCURRENTLY user_events_id_new_idx ON user_events (id_new);
</code></pre><h3 id=final-catch-up-and-sequence-configuration><a href=#final-catch-up-and-sequence-configuration>Final catch-up and sequence configuration</a></h3><p>Before the final swap, we perform a quick update on any rows inserted since the initial backfill and configure the sequence to start from the highest existing ID.<pre><code class=language-sql>-- 1. Catch-up: Update any rows that were inserted during the batch backfill
-- This should be fast, as it only targets newly inserted rows (id_new IS NULL)
UPDATE user_events
SET id_new = id
WHERE id_new IS NULL;

-- 2. Get the new sequence ready to continue from the largest existing ID
-- SERIAL uses an underlying sequence. We rename it to use for IDENTITY
ALTER SEQUENCE user_events_id_seq RENAME TO user_events_id_identity_seq;

-- Set the sequence to the current max value of the old ID (plus a buffer, e.g., 1000)
SELECT setval('user_events_id_identity_seq', (SELECT MAX(id) FROM user_events) + 1000, false);
</code></pre><h3 id=updating-foreign-key-columns-on-the-child-table><a href=#updating-foreign-key-columns-on-the-child-table>Updating foreign key columns on the child table</a></h3><p>Any table that has a foreign key referencing the primary table's ID column must be updated to <code>BIGINT</code> before the main table's switchover is completed. 🫠<p>This process is simpler as these columns are not primary keys, but it still requires a process of adding a new <code>BIGINT</code> foreign key column, backfilling, and performing a quick rename switchover for each referencing table. The trick here is adding the constraint as <code>NOT VALID</code> and making it valid later.<pre><code class=language-sql>-- 1. Enforce NOT NULL on Parent
ALTER TABLE user_events ALTER COLUMN id_new SET NOT NULL;

-- 2. Create Unique Index Concurrently, this prepares the future Primary Key without locking writes)
CREATE UNIQUE INDEX CONCURRENTLY user_events_id_new_idx ON user_events (id_new);

-- 3. Add Foreign Key Constraint to Child (NOT VALID)
ALTER TABLE user_events_log
    ADD CONSTRAINT fk_user_events_new
    FOREIGN KEY (event_id_new)
    REFERENCES user_events (id_new)
    NOT VALID;

-- 4. Validate FK (Scans table, but does not block parent updates)
ALTER TABLE user_events_log VALIDATE CONSTRAINT fk_user_events_new;

-- Done before the parent swap. Brief exclusive lock on child table only.

BEGIN;
    LOCK TABLE user_events_log IN ACCESS EXCLUSIVE MODE;

    -- Drop old FK and column
    ALTER TABLE user_events_log DROP CONSTRAINT fk_user_events;
    ALTER TABLE user_events_log DROP COLUMN event_id;

    -- Rename new column/constraint to match old names
    ALTER TABLE user_events_log RENAME COLUMN event_id_new TO event_id;
    ALTER TABLE user_events_log RENAME CONSTRAINT fk_user_events_new TO fk_user_events;
COMMIT;
</code></pre><h3 id=the-atomic-swap-brief-lock><a href=#the-atomic-swap-brief-lock>The atomic swap (brief lock)</a></h3><p>If you followed along for the sake of testing, stop your cron job <code>SELECT cron.unschedule('generate-events-traffic');</code>.<p>This is the final step, done inside a single transaction. It requires an exclusive lock, but since the index is already built, this step is purely metadata and should take milliseconds.<pre><code class=language-sql>BEGIN;
    LOCK TABLE user_events IN ACCESS EXCLUSIVE MODE;

    -- Drop the Sync Trigger (We don't need it after the swap)
    DROP TRIGGER trg_sync_id_new ON user_events;
    DROP FUNCTION sync_id_new;

    -- Drop old PK constraint
    ALTER TABLE user_events DROP CONSTRAINT user_events_pkey;

    -- Make the new column active, drop old one
    ALTER TABLE user_events DROP COLUMN id;
    ALTER TABLE user_events RENAME COLUMN id_new TO id;

    -- Add IDENTITY (Creates a fresh sequence automatically)
    ALTER TABLE user_events
    ALTER COLUMN id ADD GENERATED ALWAYS AS IDENTITY;

    -- Sync the new Sequence to the Data
    SELECT setval(pg_get_serial_sequence('user_events', 'id'), (SELECT MAX(id) FROM user_events));

    -- Re-add Primary Key (Using the pre-built index)
    -- Postgres will automatically rename the index 'user_events_id_new_idx' to 'user_events_pkey'
    ALTER TABLE user_events
    ADD CONSTRAINT user_events_pkey PRIMARY KEY USING INDEX user_events_id_new_idx;

COMMIT;
</code></pre><h2 id=conclusion><a href=#conclusion>Conclusion</a></h2><p>BIGINT is cheap! You might want to do a migration soon.<p>Migrating a sequencing column from <code>INT</code> to <code>BIGINT</code> is a complex database refactoring project, but by utilizing Postgres features like unique indexes, sequences, and the <code>NOT VALID</code> constraint trick, it can be executed with minimal application downtime.<p>Key Takeaways:<ul><li>Do as much work as possible (adding new column, index, backfilling) while the application is online.<li>Test batch sizes and vacuum to get to a backfill process that is efficient<li>Update referencing foreign key columns to BIGINT <em>before</em> the main table switch.<li>Atomic switchover: Execute the column rename and constraint setup in a single, quick transaction.</ul><p>As always, test the entire process on a non-production fork and ensure the plan works as expected before committing to production. ]]></content:encoded>
<category><![CDATA[ Production Postgres ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">451deab9e40fcfdba428e5a2ab9dde922c5a819d70e38f6ee9224c974df80c3f</guid>
<pubDate>Tue, 20 Jan 2026 08:00:00 EST</pubDate>
<dc:date>2026-01-20T13:00:00.000Z</dc:date>
<atom:updated>2026-01-20T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres Scan Types in EXPLAIN Plans ]]></title>
<link>https://www.crunchydata.com/blog/postgres-scan-types-in-explain-plans</link>
<description><![CDATA[ What is a sequential scan vs index scan vs parallel scan .... and what is a bitmap heap scan? Postgres scan types explained and diagrammed. ]]></description>
<content:encoded><![CDATA[ <p>The secret to unlocking performance gains often lies not just in <em>what</em> you ask in a query, but in <em>how</em> Postgres finds the answer. The Postgres <code>EXPLAIN</code> system is great for understanding how data is being queried. One of secretes to reading EXPLAIN plans is understanding the <strong>type of scan</strong> done to retrieve the data. The scan type can be the difference between a lightning-fast response or a slow query.<p><img alt="postgres explain plan"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/fb4c4eb8-e74c-4f68-8981-76ffbc6be300/public><p>Today I’ll break down the most common scan types, how they work, and when you’ll see them in your queries.<h2 id=sequential-scan><a href=#sequential-scan>Sequential scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/46e6aee0-8119-4a09-14fc-90890cf30e00/public" alt="postgres sequential scan, seq scan" style="float: left; margin: 0 15px 10px 0;"><p>This type of data scan reads the entire table, row by row checking to see what matches the query conditions. If you have a WHERE or FILTER, Postgres just scans each row looking for matches.<p>Sequence scans are kind of the foundation of how scans are done and for many searches, this is what Postgres will use. For very large data sets, or those queried often, sequential scans are not ideal and an index scan may be faster. For that reason - knowing how to spot a seq scan vs index scan when reading an <code>EXPLAIN</code> plan is one the most important parts of reading a scan type in a query plan.<pre><code class=language-sql>EXPLAIN select * from accounts;

QUERY PLAN
-------------------------------------------------------------
Seq Scan on accounts  (cost=0.00..22.70 rows=1270 width=36)
(1 row)
</code></pre><p><br><br><br><br><br><h2 id=index-scan><a href=#index-scan>Index Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/ba98105a-7268-4cf4-e2cc-0b12eec5ee00/public" alt="postgres index scan" style="float: left; margin: 0 15px 10px 0;"><p>When you create an index in Postgres, you’re creating a column or multi-column reference that is stored on disk. Postgres is able to use this index as a map to the data stored in the table. A basic index scan uses a B-tree to quickly find the exact location of the data using a a two-step process: first Postgres finds the entry in the index, uses the reference, and then it fetches the rest of the row data from the table.<pre><code class=language-sql>EXPLAIN select * from accounts where id = '5';

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using accounts_pkey on accounts  (cost=0.15..2.37 rows=1 width=36)
   Index Cond: (id = 5)
(2 rows)
</code></pre><p>Note that primary keys are automatically indexed with a b-tree index, so queries that involve a primary key may use an index scan.<p>An index scan is typically faster than a sequential scan in Postgres when a query needs to retrieve only a very small fraction of rows from a large table. Using the index is faster than scanning the whole table.<p>However, index scans are <strong>not</strong> always faster. In many situations, Postgres’ query planner will correctly choose a sequential scan. This is typically for cases when the table being scanned is small or the percentage of rows returned outweighs using an index. If a query returns ~10%, a sequential scan is probably faster. <br><br><br><h2 id=bitmap-index-scan><a href=#bitmap-index-scan>Bitmap Index Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/b2489008-9acf-4c48-2ad5-d570e3848800/public" alt="bitmap index scan" style="float: left; margin: 0 15px 10px 0;"><p>If an index scan or a seq scan aren’t the perfect option, Postgres can use the the bitmap index scan as a kind of hybrid approach. It is typically chosen when a query matches too many rows for an regular index scan, but not so many that a sequential scan would be the best option.<p>This shows up in an EXPLAIN plan as a two-phased approach.<ol><li><strong>Bitmap Index Scan:</strong> First, Postgres scans one or more indexes to create an in-memory "bitmap", a simple map of all the table pages that <em>might</em> contain rows you need.<li><strong>Bitmap Heap Scan:</strong> The bitmap is used to visit the main table. The key here is that it reads the required pages from the disk sequentially, which can be much faster than the random jumping of a standard index scan.</ol><p>Bitmap index scans are common when a query has multiple filter conditions that each have a separate index. The bitmap scan allows the database to use separate indexes on different columns simultaneously. You’ll see this scan come up with <code>WHERE</code> conditions joined by <code>AND</code> or <code>OR</code> operators.<pre><code class=language-sql>EXPLAIN SELECT customer_id, registration_date
FROM customer_records
WHERE gender = 'F'
  AND state_code = 'KS';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on customer_records  (cost=835.78..8669.29 rows=49226 width=12) (actual time=5.717..38.642 rows=50184.00 loops=1)
   Recheck Cond: (state_code = 'NY'::bpchar)
   Filter: (gender = 'F'::bpchar)
   Rows Removed by Filter: 49682
   Heap Blocks: exact=6370
   Buffers: shared hit=6370 read=87
   ->  Bitmap Index Scan on idx_customer_state  (cost=0.00..823.48 rows=97567 width=0) (actual time=4.377..4.378 rows=99866.00 loops=1)
         Index Cond: (state_code = 'NY'::bpchar)
         Index Searches: 1
         Buffers: shared read=87
 Planning:
   Buffers: shared hit=27 read=2
 Planning Time: 0.774 ms
 Execution Time: 40.572 ms
(14 rows)
</code></pre><p><br><br><br><h2 id=parallel-sequential-scan><a href=#parallel-sequential-scan>Parallel Sequential Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/7375434b-f942-4cd9-bf9a-07a775df9600/public" alt="parallel seq scan" style="float: left; margin: 0 15px 10px 0;"><p>You will see a parallel sequential scan when Postgres uses multiple background workers to perform more than one sequential scan on a single large table <em>at the same time</em>. The table is broken into chunks, and each worker gets a chunk to scan, and the results are combined at the end in a gather process. Depending on your query - you may also have an aggregate or sort after the parallel queries and before the final gather. This is part of <a href=https://www.crunchydata.com/blog/parallel-queries-in-postgres>Postgres’ parallel query function</a>.<pre><code class=language-sql>EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT id, data_value
FROM parallel_test
WHERE data_value &#60 100000
ORDER BY data_value DESC
LIMIT 1000;

                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=161310.11..161431.04 rows=1000 width=16) (actual time=130.300..140.555 rows=1000.00 loops=1)
   Output: id, data_value
   Buffers: shared hit=142685
   ->  Gather Merge  (cost=161310.11..220311.14 rows=487915 width=16) (actual time=130.299..140.468 rows=1000.00 loops=1)
         Output: id, data_value
         Workers Planned: 5
         Workers Launched: 5
         Buffers: shared hit=142685
         ->  Sort  (cost=160310.04..160553.99 rows=97583 width=16) (actual time=112.942..112.973 rows=861.17 loops=6)
               Output: id, data_value
               Sort Key: parallel_test.data_value DESC
               Sort Method: top-N heapsort  Memory: 163kB
               Buffers: shared hit=142685
               Worker 0:  actual time=112.535..112.571 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=21729
               Worker 1:  actual time=112.271..112.308 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=21573
               Worker 2:  actual time=112.465..112.500 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=20549
               Worker 3:  actual time=99.099..99.133 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 163kB
                 Buffers: shared hit=17033
               Worker 4:  actual time=112.333..112.368 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 163kB
                 Buffers: shared hit=19964
               ->  Parallel Seq Scan on public.parallel_test  (cost=0.00..154959.67 rows=97583 width=16) (actual time=19.238..99.868 rows=83250.83 loops=6)
                     Output: id, data_value
                     Filter: (parallel_test.data_value &#60 '100000'::numeric)
                     Rows Removed by Filter: 750082
                     Buffers: shared hit=142500
                     Worker 0:  actual time=18.837..99.169 rows=83026.00 loops=1
                       Buffers: shared hit=21692
                     Worker 1:  actual time=18.594..99.301 rows=84378.00 loops=1
                       Buffers: shared hit=21536
                     Worker 2:  actual time=18.706..99.551 rows=79196.00 loops=1
                       Buffers: shared hit=20512
                     Worker 3:  actual time=5.308..86.023 rows=81187.00 loops=1
                       Buffers: shared hit=16996
                     Worker 4:  actual time=18.694..99.497 rows=83574.00 loops=1
                       Buffers: shared hit=19927
 Planning:
   Buffers: shared hit=15
 Planning Time: 0.315 ms
 Execution Time: 140.635 ms
(47 rows)
</code></pre><p><br><br><br><h2 id=parallel-index-scan><a href=#parallel-index-scan>Parallel index scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/a406e8b5-b14f-47d3-4fcf-32f9a236c300/public" alt="parallel index scan" style="float: left; margin: 0 15px 10px 0;"><p>A parallel index scan uses the same parallel workers to scan through an index concurrently. This uses the same methodology of the index scan - except that multiple workers are doing it simultaneously. Each process reads a different part of the index and returns results. Like the other parallel scans, this ends in a gather.<p>You will see a parallel index scan done when the indexes and tables involved are very large - and the overall operation to split things up and gather them at the end is faster than handing the job to a single worker.<pre><code class=language-sql>EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT data_id, filler_text
FROM parallel_index_test
WHERE data_id BETWEEN 1000000 AND 2000000;

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.43..34560.34 rows=995971 width=109) (actual time=1.014..145.796 rows=1000001.00 loops=1)
   Output: data_id, filler_text
   Workers Planned: 4
   Workers Launched: 4
   Buffers: shared hit=23385
   ->  Parallel Index Scan using idx_data_id on public.parallel_index_test  (cost=0.43..33564.37 rows=248993 width=109) (actual time=0.941..38.211 rows=200000.20 loops=5)
         Output: data_id, filler_text
         Index Cond: ((parallel_index_test.data_id >= 1000000) AND (parallel_index_test.data_id &#60= 2000000))
         Index Searches: 1
         Buffers: shared hit=23385
         Worker 0:  actual time=2.104..45.540 rows=240638.00 loops=1
           Buffers: shared hit=5640
         Worker 1:  actual time=2.174..45.169 rows=240096.00 loops=1
           Buffers: shared hit=5638
         Worker 2:  actual time=0.067..45.380 rows=242658.00 loops=1
           Buffers: shared hit=5693
         Worker 3:  actual time=0.306..45.122 rows=242292.00 loops=1
           Buffers: shared hit=5686
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.526 ms
 Execution Time: 180.660 ms
(22 rows)
</code></pre><p><br><br><br><h2 id=index-only-scan><a href=#index-only-scan>Index-Only Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/d561c4a4-5a31-4982-c7c5-9f322a327100/public" alt="postgres index only scan" style="float: left; margin: 0 15px 10px 0;"><p>An Index-Only Scan is the superstar of scans and answers the entire query using <em>only</em> the information stored within the index itself. Index only scans are also called “covering indexes” meaning the index itself covers all the data. It never even has to touch the main table. Index only scans are a huge performance win because they’re very fast - no information needs to be retrieved from the heap table. They also typically use less i/o resources because indexes are very cache friendly and often in shared buffers - meaning no data needs to be read for the underlying disk.<p>Queries benefit from a covering index in these situations:<ul><li>The query is very frequently executed.<li>The current query is performing a standard index scan followed by many slow disk reads (heap fetches) and using i/o.<li>The query only requires a small subset of the table's columns, for example you select only three columns from a table of twenty.<li>The columns have a low write frequency. Any column that is indexed must be written to disk and the index, so if you start adding covering indexes for all your columns - you’re essentially creating write amplification.<li>The new index, which must cover all needed columns, won't be excessively large. Indexes are stored on disk so you don’t want to cause storage issues.</ul><pre><code class=language-sql>EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT code, status
FROM index_only_test
WHERE code > 'CODE_050000'
ORDER BY code
LIMIT 100;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..2.60 rows=100 width=13) (actual time=0.346..0.362 rows=100.00 loops=1)
   Output: code, status
   Buffers: shared hit=1 read=3
   ->  Index Only Scan using idx_code_status on public.index_only_test  (cost=0.42..1068.02 rows=49000 width=13) (actual time=0.345..0.352 rows=100.00 loops=1)
         Output: code, status
         Index Cond: (index_only_test.code > 'CODE_050000'::text)
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=1 read=3
 Planning:
   Buffers: shared hit=19
 Planning Time: 1.838 ms
 Execution Time: 0.385 ms
(13 rows)
</code></pre><p><br><br><br><h2 id=summary><a href=#summary>Summary</a></h2><p>We’ve covered all the major scan types so now reading your <code>EXPLAIN</code> plans will be a little easier.<ul><li>Seq scan - Postgres looks through the whole table in sequential order to find the query data<li>Index scan - Postgres first looks at the index and then fetches the row data the index pointed to<li>Bitmap index scan - Postgres first read the index and created a <strong>bitmap</strong> list matching rows. Second, Postgres read the data heap using the bitmap in a more efficient method than a sequential scan.<li>Parallel scan - Postgres used multiple parallel workers to scan the table and data was gathered at the end<li>Parallel index scan - Postgres used multiple workers to do an index scan and data was gathered at the end<li>Index only scan- All data for the query was in the index</ul><p>And here’s everything all in one graphic:<p><img alt="postgres index only scan"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/e886cea5-4785-4136-ba99-ff46a3b03000/original> ]]></content:encoded>
<category><![CDATA[ Production Postgres ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">08ee92c4d2dfc4ad6be90a57493965c9cdb5a8e3c06cedc4fd8eddfb425c08c9</guid>
<pubDate>Thu, 04 Dec 2025 08:00:00 EST</pubDate>
<dc:date>2025-12-04T13:00:00.000Z</dc:date>
<atom:updated>2025-12-04T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres Internals Hiding in Plain Sight ]]></title>
<link>https://www.crunchydata.com/blog/postgres-internals-hiding-in-plain-sight</link>
<description><![CDATA[ Get under the hood of Postgres by looking at psql, system views, and internal tables. ]]></description>
<content:encoded><![CDATA[ <p>Postgres has an awesome amount of data collected in its own internal tables. Postgres hackers know all about this  - but software developers and folks working with day to day Postgres tasks often miss out the good stuff.<p>The Postgres catalog is how Postgres keeps track of itself. Of course, Postgres would do this in a relational database with its own schema. Throughout the years several nice features have been added to the internal tables like psql tools and views that make navigating Postgres’ internal tables even easier.<p>Today I want to walk through some of the most important Postgres internal data catalog details. What they are, what is in them, and how they might help you understand more about what is happening inside your database.<h2 id=psqls-catalog-information><a href=#psqls-catalog-information>psql’s catalog information</a></h2><p>The easiest way to get at some of Postgres’ internal catalogs is to use the built-in <a href=https://www.crunchydata.com/developers/playground/psql-basics>psql commands</a> that begin \d generally. Here’s some common Postgres ones users should be comfortable using:<p><code>\d {tablename}</code>: describes a specific table. \d will do a lot of things if you qualify \d with a table or view name.<p><code>\di</code>: list all your indexes<p><code>\dx</code>: list installed extensions<p><code>\dp</code>: to show access privileges<p><code>\dp+</code>: tables and views with the roles and access details<p><code>\dconfig</code>: your current configuration settings<p><code>\dt {tablename}</code>: describe a table<p><code>\dti+</code>: tables and indexes with sizes<p><code>\dg+</code>: show role names<p><code>\df</code>:  show your functions<p><code>\dv {view name}</code>: describe a view<p><code>\l</code>: lists all your databases<h2 id=important-postgres-catalog-views><a href=#important-postgres-catalog-views>Important Postgres catalog views</a></h2><p>Postgres exposes many of the complex internals of the database system in easy-to-query views. These host a wealth of information about what is going on inside your database and direct SQL access to answer in the moment emergency questions like “what is taking up all my CPU” and more long term questions like “what are my 10 slowest queries”.<h3 id=pg_stat_activity><a href=#pg_stat_activity>pg_stat_activity</a></h3><p>Shows current database activity, including running queries, state, and client information. Essential for troubleshooting and getting process ids (pid) for bad actors.<pre><code class=language-sql>SELECT pid, usename, datname, client_addr, application_name, state, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY state, query_start DESC;
</code></pre><h3 id=pg_stat_statements><a href=#pg_stat_statements>pg_stat_statements</a></h3><p>This requires the pg_stat_statements extension - but it is part of the contrib library and ships with Postgres, so doesn't require separate installation.<p>This view tracks execution statistics for all queries executed by all databases. It's incredibly powerful for identifying slow or frequently executed queries.<pre><code class=language-sql>-- pg_stat_statements 10 longest running queries
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
</code></pre><h3 id=pg_stat_database><a href=#pg_stat_database>pg_stat_database</a></h3><p>This view provides database-wide statistics, such as the number of connections, transactions, and I/O. It's useful for a high-level overview of database activity and health.<pre><code class=language-sql>-- high leve db stats for the postgres db
SELECT datname,numbackends, xact_commit, xact_rollback, blks_read, blks_hit
FROM pg_stat_database
WHERE datname = 'postgres';
</code></pre><h3 id=pg_locks><a href=#pg_locks>pg_locks</a></h3><p>This view displays information about locks held by active processes. This is the go to place for troubleshooting locking issues, deadlocks, and contention within the database. We have a great blog on locking and <a href=https://www.crunchydata.com/blog/one-pid-to-lock-them-all-finding-the-source-of-the-lock-in-postgres>how to find the source of the lock in Postgres</a>.<pre><code class=language-sql>-- locks joined with the activity table. Shows not granted locks, typically those that could not be granted because they were blocked by other locks
SELECT a.datname, l.pid, l.locktype, l.relation::regclass, l.mode, l.granted
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE NOT l.granted;
</code></pre><h3 id=pg_stat_user_tables><a href=#pg_stat_user_tables>pg_stat_user_tables</a></h3><p>This view offers statistics on tables, including sequential scans, index scans, and row-level operations (inserts, updates, deletes). It's great for identifying tables with heavy activity or those that need vacuuming.<pre><code class=language-sql>-- see sequence scans and index scans by table
SELECT relname AS table_name, seq_scan, idx_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0 OR idx_scan > 0 ORDER BY seq_scan DESC;
</code></pre><h3 id=pg_stat_user_indexes><a href=#pg_stat_user_indexes>pg_stat_user_indexes</a></h3><p>This view provides statistics on user indexes, such as how often they're used and how many tuples are read. This is particularly herpful for finding unused or underutilized indexes.<pre><code class=language-sql>-- Never used indexes in Postgres sorted by size
SELECT s.schemaname, s.relname AS table_name, s.indexrelname AS index_name, pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size, s.idx_scan
FROM pg_stat_user_indexes AS s
JOIN pg_index AS i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0 AND i.indisunique IS FALSE
ORDER BY pg_relation_size(s.indexrelid) DESC;
</code></pre><h3 id=pg_settings><a href=#pg_settings>pg_settings</a></h3><p>This is a prebuilt view that is super useful for viewing configuration parameters, their current values, and their descriptions. Qualify with <code>ILIKE</code> to see exact parameters you’re looking for.<pre><code class=language-sql>-- find shared_buffer or work_mem settings
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name LIKE '%shared_buffers%' OR name LIKE '%work_mem%';
</code></pre><h3 id=pg_roles><a href=#pg_roles>pg_roles</a></h3><p>This view describes all system roles, which include users and groups. It's useful for checking permissions, login capabilities, and role memberships.<pre><code class=language-sql>-- This query lists all roles, showing their names, whether they can log in, and their password expiration date.
SELECT rolname, rolcanlogin, rolvaliduntil
FROM pg_roles
ORDER BY rolname;
</code></pre><h3 id=pg_database><a href=#pg_database>pg_database</a></h3><p>This view contains all databases in the cluster. It provides key metadata for each database, including its owner, character encoding, and access privileges. We have a lot of folks now that create dozens and sometimes hundreds of databases for development, so this is a good high level view.<pre><code class=language-sql>-- This query lists all Postgres databases, their sizes, and owners.
SELECT d.datname AS database_name, pg_size_pretty(pg_database_size(d.datname)) AS database_size, pg_get_userbyid(d.datdba) AS owner
FROM pg_database AS d
WHERE d.datistemplate = false;
</code></pre><h2 id=postgres-catalog-tables><a href=#postgres-catalog-tables>Postgres catalog tables</a></h2><p>Behind the Postgres metacommands and views - there are several core catalog tables. Many of the psql commands match up with the catalog tables. Something roughly like this:<table><thead><tr><th><strong>psql command</strong><th><strong>what data</strong><th><strong>catalog tables</strong><tbody><tr><td>\d<td>tables and table objects<td>pg_class<tr><td>\di<td>indexes<td>pg_class, pg_index<tr><td>\dx<td>installed extensions<td>pg_extension<tr><td>\dp<td>tables and privileges<td>pg_class, pg_roles, pg_attribute<tr><td>\l<td>databases<td>pg_database<tr><td>\df<td>available functions<td>pg_proc<tr><td><td><td></table><p>Let’s look at these and how you might want to use them.<h3 id=pg_stats><a href=#pg_stats>pg_stats</a></h3><p>The pg_stats table collects all the details about your columns - things like cardinality - are there many items in this column or a few? Postgres uses a lot of the details in pg_stats to make decisions for the query planner and efficiently. In some cases, giving <a href=https://www.crunchydata.com/blog/hacking-the-postgres-statistics-tables-for-faster-queries>pg_stats more information can make your queries faster.</a><pre><code class=language-sql>-- table column data like cardinality
SELECT * FROM pg_stats
WHERE tablename = 'table_name'
AND attname = 'column_name';
</code></pre><h3 id=pg_class><a href=#pg_class>pg_class</a></h3><p>pg_class contains a row for every table, index, sequence, view, materialized view, and other "relation-like" objects in the database. Sometimes this is a nice high level view of an entire table’s accoutrements.<pre><code class=language-sql>SELECT c.relname, pg_get_userbyid(c.relowner) AS owner
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'public' AND c.relkind = 'r'
ORDER BY c.relname;
</code></pre><h3 id=pg_type><a href=#pg_type>pg_type</a></h3><p>This table stores all data types that exist. It's confusing though - in Postgres, every table has an associated composite type that defines the structure of its rows. So if you do a <code>select *</code> you’ll see all the table names here and all the data types. If you filter a bit, you can see all your custom data types, domains, and enums.<pre><code class=language-sql>-- see your custom data types in Postgres
SELECT
    t.typname AS type_name,
    n.nspname AS schema_name,
    t.typtype AS type_class
FROM
    pg_type AS t
JOIN
    pg_namespace AS n ON t.typnamespace = n.oid
LEFT JOIN pg_class c ON typrelid = c.oid
WHERE
    t.typtype IN ('e', 'd', 'c') -- 'e' for enum, 'd' for domain, 'c' for composite types.
    AND n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
    AND (t.typtype &#60> 'c' OR c.relkind = 'c')
ORDER BY
    schema_name, type_name;
</code></pre><h3 id=pg_proc><a href=#pg_proc>pg_proc</a></h3><p>This is the catalog of all functions and stored procedures that Postgres can use. It contains metadata about each routine. Made a function last week but can’t find it now? Just scan through all of them.<pre><code class=language-sql>-- This query finds all functions, triggers, and stored procedures.
SELECT proname AS function_name, proargnames AS argument_names, pg_catalog.format_type(prorettype, NULL) AS return_type
FROM pg_proc
ORDER BY proname;
</code></pre><h3 id=pg_attribute><a href=#pg_attribute>pg_attribute</a></h3><p>This table stores information about table columns and there is one row in <code>pg_attribute</code> for every column in every table. While indexes and other objects that have an entry in <code>pg_class</code>.<p>Query columns and data types for any table with a query like this:<pre><code class=language-sql>SELECT
    a.attname AS column_name,
    pg_catalog.format_type(a.atttypid, a.atttypmod) AS data_type
FROM
    pg_catalog.pg_attribute a
WHERE
    a.attrelid = 'orders'::regclass
    AND a.attnum > 0
    AND NOT a.attisdropped
ORDER BY
    a.attnum;
</code></pre><h3 id=pg_catalog-schema><a href=#pg_catalog-schema>pg_catalog schema</a></h3><p>The pg_catalog is the schema holding the system tables, so you will either need to include <code>pg_catalog</code> in your <code>search_path</code> (the default), or any query you issue will need to be qualified with <code>pg_catalog</code>.<p>Here’s a summary of the internal catalog tables:<table><thead><tr><th>pg_catalog<th>schema holding all the catalog tables<tbody><tr><td>pg_stats<td>table and column statistics, like cardinality<tr><td>pg_attribute<td>row for every table column<tr><td>pg_class<td>every table, index, view, materialized view, forgien table<tr><td>pg_type<td>data types, built in and custom</table><h2 id=exploring-system-tables-with-echo_hidden-or--e><a href=#exploring-system-tables-with-echo_hidden-or--e>Exploring system tables with <code>ECHO_HIDDEN</code> or <code>-E</code></a></h2><p>Sometimes navigating these tables and views can be confusing and require browsing through a mix of docs and source code. If you want to have some fun exploring how the catalog is connected, you can connect to your database with <code>-E</code> argument to psql (or do <code>\set ECHO_HIDDEN</code> on if you’re already connected). Postgres will echo each psql the command that's run with SQL so you can grab the underlying SQL and edit from there.<p>For example, echoing <code>\dt+</code> will show me a query and the results.<pre><code class=language-sql>SELECT n.nspname as "Schema",
  c.relname as "Name",
  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 't' THEN 'TOAST table' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'partitioned table' WHEN 'I' THEN 'partitioned index' END as "Type",
  pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
  CASE c.relpersistence WHEN 'p' THEN 'permanent' WHEN 't' THEN 'temporary' WHEN 'u' THEN 'unlogged' END as "Persistence",
  am.amname as "Access method",
  pg_catalog.pg_size_pretty(pg_catalog.pg_table_size(c.oid)) as "Size",
  pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
     LEFT JOIN pg_catalog.pg_am am ON am.oid = c.relam
WHERE c.relkind IN ('r','p','')
      AND n.nspname &#60> 'pg_catalog'
      AND n.nspname !~ '^pg_toast'
      AND n.nspname &#60> 'information_schema'
  AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;

List of tables
-[ RECORD 1 ]-+--------------
Schema        | public
Name          | articles
Type          | table
Owner         | dba
Persistence   | permanent
Access method | heap
Size          | 16 kB
Description
</code></pre><h2 id=getting-to-postgres-internals><a href=#getting-to-postgres-internals>Getting to Postgres internals</a></h2><ol><li>The easiest way to see internals is to start with the psql <code>\d</code> commands<li>The prebuilt views like <code>pg_stat_activity</code>, <code>pg_stat_statements</code>, <code>pg_locks</code>, and <code>pg_stat_user_indexes</code> are ready to go for easy querying and searching.<li>Going a step deeper, you can access the underlying internal Postgres tables, housed in the pg_catalog schema. <code>-E</code> <code>echo_hidden</code> can help you see the tables involved if you echo psql commands.</ol> ]]></content:encoded>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">2d460799803e47f9fd2ff2a2f2306f8836e6b53ce0d7f4ba8bebfce8072fab07</guid>
<pubDate>Fri, 07 Nov 2025 08:00:00 EST</pubDate>
<dc:date>2025-11-07T13:00:00.000Z</dc:date>
<atom:updated>2025-11-07T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres’ Original Project Goals: The Creators Totally Nailed It ]]></title>
<link>https://www.crunchydata.com/blog/the-postgres-project-original-goals-and-how-the-creators-totally-nailed-it</link>
<description><![CDATA[ Dig in to the original goals of the Postgres academic project at UC Berkeley and how they shaped the Postgres we use today. ]]></description>
<content:encoded><![CDATA[ <p>I had a chance last week to sit down and read the <a href=https://dsf.berkeley.edu/papers/ERL-M85-95.pdf>original academic paper announcing Postgres</a> as a platform and the original design goals from 1986. I was just awestruck at the forethought - and how the original project goals laid the foundation for the database that seems to be taking over the world right now.<p>The PostgreSQL creators totally nailed it. They laid out a flexible framework for a variety of business use cases that would eventually become the most popular database 30 years later.<p>The paper outlines 6 project goals:<ol><li><p>better support for complex objects growing world of business and engineering use cases<li><p>provide user extendibility for data types, operators and access methods<li><p>provide facilities for active databases like alerters and triggers<li><p>simplify process for crash recovery<li><p>take advantage of upgraded hardware<li><p>utilize Codd’s relational model</ol><p>Let's look at all of them in reference to modern features of Postgres.<h2 id=1-objects-and-data-types-for-a-growing-world-of-business-and-engineering-use-cases><a href=#1-objects-and-data-types-for-a-growing-world-of-business-and-engineering-use-cases>1) Objects and data types for a growing world of business and engineering use cases</a></h2><p>Postgres has a rich and flexible set of native data types that are designed to meet a vast array of business use cases, from simple record-keeping to complex data analysis.<p>Numeric Types like <code>SMALLINT</code> and <code>INTEGER</code> are used for whole numbers while <code>BIGINT</code> might be for a user's unique ID or primary keys. Precision like <code>NUMERIC</code> and  <code>DECIMAL</code> are used, exact precision is critical, especially for <a href=https://www.crunchydata.com/blog/working-with-money-in-postgres>money in Postgres</a>. Floating-Point Types like <code>REAL</code> or <code>DOUBLE PRECISION</code> can be used for scientific or engineering calculations where absolute precision isn't as important as the range of values. You also have your <code>UUID</code> (<a href=https://www.crunchydata.com/blog/get-excited-about-postgres-18>indexable UUIDs</a> in Postgres 18) for distributed systems and secure URLs.<p>Character Types like <code>VARCHAR(n)</code> or <code>CHAR(n)</code> store variable-length text up to a specified maximum length (n) and only use as much storage as needed for the actual text.<p>Date/Time Types like <code>DATE</code> stores only the date (year, month, day).  <a href=https://www.crunchydata.com/blog/working-with-time-in-postgres><code>TIMESTAMPTZ</code></a> is the time and date GOAT with and is easily implemented into global systems.<p>But, wait, that’s not all, Postgres has within it, the ability to easily make <strong>custom data types</strong> and constrain data to the specifics of each use case.<p><a href=https://www.crunchydata.com/blog/intro-to-postgres-custom-data-types#using-create-domain>Using CREATE DOMAIN</a> you can create specific value check like confirming a range for birthday or email format validity.<pre><code class=language-sql>-- Postgres create domain
CREATE DOMAIN date_of_birth AS date
CHECK (value > '1930-01-01'::date);

CREATE DOMAIN valid_email AS text
NOT NULL
CHECK (value ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$');
</code></pre><p>Or using a direct <code>CREATE TYPE</code> you can make a new type as a composite. For example, new custom date type allowing for storage of height, width, and, weight in a single field.<pre><code class=language-sql>-- Postgres create type with composite
CREATE TYPE physical_package AS (
height numeric,
width numeric,
weight numeric);
</code></pre><p><a href=https://www.crunchydata.com/blog/enums-vs-check-constraints-in-postgres><code>Enums</code></a> let you create a custom type with a set of predefined values.<pre><code class=language-sql>-- Postgres enum
CREATE TYPE order_status AS ENUM (
'pending',
'shipped',
'cancelled');
</code></pre><p>Constraints take the enumerated type a bit further and let you specify rules and restrictions for data. Additionally adding a <code>CHECK</code> constraint to a list or even refer to other fields, like reserving a room with a start and end time.<pre><code class=language-sql>-- Postgres check contraint
ALTER TABLE public.reservations
ADD CONSTRAINT start_before_end
CHECK (start_time &#60 end_time);
</code></pre><p>While most applications will constrain data in its own way, Postgres’ strict and flexible typing allows both rigid validity and flexibility.<h2 id=2-extensibility-for-data-types-operators-and-access-methods><a href=#2-extensibility-for-data-types-operators-and-access-methods>2) Extensibility for data types, operators and access methods</a></h2><p>The authors knew that just data types wouldn’t be enough - the system would actually need to be extensible. In my estimation - this is actually the killer feature of Postgres. Sure, the database is solid  - but the ingenuity and enthusiasm of the extension ecosystem is incredibly special.<p>Let’s take PostGIS for example. This extension adds several key data types to the mix - the point, line, polygon, to store geospatial types. PostGIS also has hundreds of functions with it. There’s now an entire ecosystem of its own around this project that includes open-source mapping and fully open source web servers that rival paid GIS systems like ESRI.<p>The <code>pgvector extension</code> is another good example of Postgres extensibility too. Now <a href=https://www.crunchydata.com/blog/whats-postgres-got-to-do-with-ai>Postgres can store embedding data</a> right alongside application data. You can have LLMs create embeddings based on your data and you can query your data to find relatedness. You can also build your own <a href=https://www.crunchydata.com/blog/smarter-postgres-llm-with-retrieval-augmented-generation>Postgres RAG</a> system right inside your database<pre><code class=language-sql>-- find distance between two embedding values
recipe_1.embedding &#60=> recipe_2.embedding
</code></pre><p>Data types and extensions aren’t the only thing that came out of this idea though - the indexes themselves in Postgres are incredibly advanced. Generalized Inverted Index (GIN) and Generalized Search Tree (GiST) are themselves extensible indexing frameworks that support many of the complex data types mentioned above.<h2 id=3-features-for-active-databases-like-alerters-and-triggers><a href=#3-features-for-active-databases-like-alerters-and-triggers>3) Features for active databases like alerters and triggers</a></h2><p>Modern Postgres users have a suite of tools available to them to have the database do necessary work. The trigger system easily updates fields once another field changes.<pre><code class=language-sql>-- Postgres sample function to update fields
CREATE OR REPLACE FUNCTION update_inventory_on_sale()
RETURNS TRIGGER AS $$
BEGIN
UPDATE products
SET quantity_on_hand = quantity_on_hand - NEW.quantity_sold
WHERE id = NEW.product_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'No product found with ID %', NEW.product_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
</code></pre><p>For events outside the database, Postgres has a handy little <code>NOTIFY/LISTEN</code> mechanism for sending notifications to the outside so your application or dashboard will know when a new order was placed or a specific action happened. There’s an extension now to use the <a href=https://www.crunchydata.com/blog/real-time-database-events-with-pg_eventserv>listen notify system events as WebSockets</a>.<p>Postgres’ <a href=https://www.crunchydata.com/blog/data-to-go-postgres-logical-replication>logical replication</a> makes use of the ‘active database’ idea. PostgreSQL's logical replication is cool because it streams individual data changes rather than physical block-level copies, allowing you to replicate data between different major Postgres versions or even different platforms. This flexibility enables powerful use cases like creating specialized read replicas, consolidating multiple databases into a central one, and performing zero-downtime major version upgrades.<pre><code class=language-sql>-- Postgres create logical replication
CREATE PUBLICATION user_pub FOR TABLE user_id, forum_posts;
</code></pre><h2 id=4-simplify-process-for-crash-recovery><a href=#4-simplify-process-for-crash-recovery>4) Simplify process for crash recovery</a></h2><p>The original method of Postgres data recovery relied on writing all data modifications to the files on disk before each commit which was called "force-to-disk". Unfortunately this original implementation had major performance issues and a potential for corruption. The Write Ahead Log (WAL) which was released with version 7.1 changed this into a different system that first writes changes to a log file and then applies those changes to the main data files.<p>WAL is the foundation of all of Postgres’ amazing backup and disaster recovery story. WAL is used to create incremental backups, complete with the <a href=https://www.crunchydata.com/blog/database-terminology-explained-postgres-high-availability-and-disaster-recovery#disaster-recovery-is-about-more-than-just-availability>Point-in-Time disaster recovery</a> system that many rely on today.<p>WAL is also foundational to Postgres streaming replication, which makes high availability possible. A primary writes all database changes (inserts, updates, deletes) into its Write-Ahead Log and then "streams" these WAL records over the network to the standby (replica) nodes. The standby nodes receive these WAL records and apply them to their own copy of the database, keeping them in sync with the primary. In the event of an emergency automated failover, like <a href=https://github.com/patroni/patroni>Patroni</a>, can promote a new primary.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/bc74acb6-3405-43f1-cee3-153c8375be00/public><h2 id=5-take-advantage-of-upgraded-hardware><a href=#5-take-advantage-of-upgraded-hardware>5) Take advantage of upgraded hardware</a></h2><p>PostgreSQL was engineered for the hardware realities of its time: single-core CPUs, severely limited RAM often measured in megabytes, and slow, spinning hard drives. The primary design focus was on correctness and data durability over raw speed. PostgreSQL built its legendary reputation for stability and ACID compliance, ensuring that data remained safe even when running on less reliable hardware.<p>Fast forward to today, where PostgreSQL runs on hardware with dozens of CPU cores, terabytes of ultra-fast NVMe storage and vast amounts of RAM (we even have half a tb of RAM available now). PostgreSQL recently introduced <a href=https://www.crunchydata.com/blog/parallel-queries-in-postgres>parallel query execution</a> which breaks up complex queries and runs them simultaneously, gathering the results at the end. Modern PostgreSQL has also vastly improved its locking mechanisms, connection pooling solutions, and replication capabilities, evolving from a robust single-server database into a high-performance powerhouse that can scale horizontally and handle the massive, concurrent workloads of the modern internet.<p>While Postgres today does not yet have the modern CPU <a href=https://wiki.postgresql.org/wiki/Multithreading>multi-threading</a>, this is on the horizon, and Postgres 18 just added <a href=https://www.crunchydata.com/blog/get-excited-about-postgres-18>asynchronous i/o</a>.<h2 id=6-utilize-codds-relational-model><a href=#6-utilize-codds-relational-model>6) Utilize Codd’s relational model</a></h2><p>At the height of the NoSQL movement in the late 2000s and early 2010s, a common story was told that relational databases were a relic of the past. With the rise of big and unstructured data, this old model may soon be cast out.<p>Postgres continued to do what it always has done and embraced its core strength - flexibility of data typing – and adopted some of NoSQL’s own ideas. Postgres introduced the JSON data type and then later the binary, <a href=https://www.crunchydata.com/blog/indexing-jsonb-in-postgres>indexable JSONB</a> type. With this update, applications can now store schema-less API driven JSON data directly in a relational database and query it efficiently using a rich set of operators and functions. With features like <a href=https://www.crunchydata.com/blog/easily-convert-json-into-columns-and-rows-with-json_table><code>json_table</code></a>, you can go between arrays or traditional tables.<p>The newest revolution in the Postgres world seems to be the adoption of technologies to tie Postgres directly to unstructured flat files. Projects like pg_duckdb, pg_mooncake, and <a href=https://www.crunchydata.com/products/warehouse>Crunchy Data Warehouse</a> use custom extensions to work directly with files in csv, Parquet, and Iceberg directly in the data lake remote object stores where they reside. Even with the data abstracted to another location, Postgres’ relational model is still relevant, efficient, and trusted.<h2 id=summary><a href=#summary>Summary</a></h2><p>With Postgres’ flexibility - you can have a fully normalized, relational schema with foreign keys and JOINs, while also having an indexed JSONB document and full spatial geometry. We’re at a point in history where AI, science, and research are backed by a database that had no idea what the world would be like when it was built. Postgres is still here.<p>These original goals have had a profound impact on the project. Allowing for complexity and flexibility in a growing business landscape, while being easy to alter for individual use cases. And being ready for hardware (and cloud) technology that makes Postgres’ distribution even easier. ]]></content:encoded>
<category><![CDATA[ Postgres 18 ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">1e8a25ee9384198a8eb616a19c72c4d852e451563c2209f890ca1e7d8545a4ac</guid>
<pubDate>Tue, 23 Sep 2025 09:00:00 EDT</pubDate>
<dc:date>2025-09-23T13:00:00.000Z</dc:date>
<atom:updated>2025-09-23T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Get Excited About Postgres 18 ]]></title>
<link>https://www.crunchydata.com/blog/get-excited-about-postgres-18</link>
<description><![CDATA[ New to Postgres 18, features like asynchronous i/o, uuid v7, b-tree skip scans, and virtual generated columns. ]]></description>
<content:encoded><![CDATA[ <p>Postgres 18 will be released in just a couple weeks! Here’s some details on the most important and exciting features.<h2 id=asynchronous-io><a href=#asynchronous-io>Asynchronous i/o</a></h2><p>Postgres 18 is adding asynchronous i/o. This means faster reads for many use cases. This is also part of a bigger series of performance improvements planned for future Postgres, part of which may be multi-threading. Expect to see more on this in coming versions.<p><strong>What is async I/O?</strong><p>When <a href=https://www.crunchydata.com/blog/postgres-data-flow>data</a> isn’t in the shared memory buffers already, Postgres reads from disk, and <a href=https://www.crunchydata.com/blog/understanding-postgres-iops>I/O is needed to retrieve data</a>. Synchronous I/O means that each individual request to the disk is waited on for completion before moving on to something else. For busy databases with a lot of activity, this can be a bottleneck.<p>Postgres 18 will introduce asynchronous I/O, allowing workers to optimize idle time and improve system throughput by batching reads. Currently, Postgres relies on the operating system for intelligent I/O handling, expecting OS or storage read-ahead for sequential scans and using features like Linux's posix_fadvise for other read types like Bitmap Index Scans. Moving this work into the database with asynchronous I/O will provide a more predictable and better-performing method for batching operations at the database level. Additionally, a new system view, pg_aios, will be available to provide data about the asynchronous I/O system.<p>Postgres writes will continue to be synchronous - since this is needed for ACID compliance.<p>If async i/o seems confusing, think of it like ordering food at a restaurant. In a synchronous model, you would place your order and stand at the counter, waiting, until your food is ready before you can do anything else. In an asynchronous model, you place your order, receive a buzzer, and are free to go back to your table and chat with friends until the buzzer goes off, signaling that your food is ready to be picked up.<p>Async I/O will affect:<ul><li>sequential scans<li>bitmap heap scans (following the bitmap index scan)<li>some maintenance operations like VACUUM.</ul><p>By default Postgres will turn on <strong>io_method = worker</strong>. By default there are 3 workers and this can be adjusted up for systems with larger CPU workers. I haven’t seen any reliable recommendations on this, so stay tuned for more on that from our team soon.<p>For Postgres running on Linux 5.1+ you can utilize the io_uring system calls and have the invocations made via the actual backends rather than having separate processes with the optional <strong>io_method = io_uring</strong>.<h2 id=uuid-v7><a href=#uuid-v7>UUID v7</a></h2><p>UUIDs are getting a bit of an overhaul in this version by moving to v7.<p>UUIDs are randomly generated strings which are globally unique and often used for primary keys. UUIDs are popular in modern applications for a couple reasons:<ul><li>They’re unique: You can use keys generated from more than one place.<li>Decoupled:Your application can generate a primary key <em>before</em> sending the data to the database.<li>URL obscurity: If your URLs use primary keys (e.g., .../users/5), other URLs are easy to guess (.../users/6, .../users/7). With a UUID (.../users/f47ac10b-58cc-4372-a567-0e02b2c3d479), it's impossible to guess other IDs.</ul><p>A new standard for UUID v7 came out in mid-2024 via a series of standards updates. UUIDv4 was the prior version of uuid with native Postgres support. But sorting and indexing in large tables had performance issues due to the relative randomness, leading to fragmented indexes and bad locality.  UUIDv7 helps with the sort and indexing issues. It is still random but that first 48 bits (12 characters) are a timestamp, and the remaining bits are random; this gives better locality for data inserted around the same time and thus better indexability.<p>The timestamp part is a hexadecimal value (i.e. compressed decimal). So for example a uuid that begins with <code>01896d6e4a5d6</code> (hex) would represent the <code>2707238289622</code> (decimal) and that is the number of milliseconds since 1970.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/2bf43dd0-9a3a-4535-55c0-5f18a9a9a200/public><p>This is how the DDL will look for uuid v7:<pre><code class=language-sql>CREATE TABLE user_actions (
action_id UUID PRIMARY KEY DEFAULT uuidv7(),
user_id BIGINT NOT NULL,
action_description TEXT,
action_time TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_action_id ON user_actions (action_id);
</code></pre><h2 id=b-tree-skip-scans><a href=#b-tree-skip-scans>B-tree skip scans</a></h2><p>There’s a nice performance bump coming in Postgres 18 for some multi-column B-tree indexes.<p>In Postgres, if you have an index on columns (<code>status</code>, <code>date</code>) in a table, this index can be used to match queries which query both <code>status</code> and <code>date</code> fields, or just <code>status</code>.<p>In Postgres 17 and below, this same index cannot be used to answer queries against just the <code>date</code> field; you would have to have that column indexed separately or the database would resort to a sequence scan + filter approach if there were no appropriate indexes for that table.<p>In Postgres 18, in many cases it can automatically use this multi-column index for queries touching only the <code>date</code> field.  Known as a skip scan, this lets the system "skip" over portions of the index.<p>This works when queries don’t use the leading columns in the conditions and the omitted column has a low cardinality, like a small number of distinct values. The optimization works by:<ol><li>Identifying all the distinct values in the omitted leading column(s).<li>Effectively transform the query to add the conditions to match the leading values.<li>The resulting query is able to use existing infrastructure to optimize lookups across multiple leading columns, effectively skipping any pages in the index scan which do not match both conditions.</ol><p>For example, if we had a sales table with columns <code>status</code> and <code>date</code>, we might have a multi-column index:<pre><code class=language-sql>CREATE INDEX idx_status_date
ON sales (status, date);
</code></pre><p>An example query could have a where clause that doesn’t include status.<pre><code class=language-sql>SELECT * FROM sales
WHERE date = '2025-01-01';
</code></pre><p>Nothing in the query plan tells you this is a skip scan, so you’ll end up with a normal Index scan like this, showing you the index conditions.<pre><code class=language-sql>                                QUERY PLAN
-------------------------------------------------------------
 Index Only Scan using idx_status_date on sales  (cost=0.29..21.54 rows=4 width=8)
   Index Cond: (date = '2025-01-01'::date)
(2 rows)
</code></pre><p>Before 18, a full table scan would be done, since the leading column of the index is not included, but with skip scan Postgres can use the same index for this index scan.<p>In Postgres 18, because status has a low cardinality and just a few values, a compound index scan can be done. Note that this optimization only works for queries which use the <code>=</code> operator, so it will not work with inequalities or ranges.<p>This all happens behind-the-scenes in the Postgres planner so you don’t need to turn it on. The idea is that it will benefit analytics use cases where filters and conditions often change and aren’t necessarily related to existing indexes.<p>The query planner will decide if using a skip scan is worthwhile, based on the table's statistics and the number of distinct values in the columns being skipped.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/6d5ed16d-2a24-4ff4-4a6c-fd42773e4b00/public><h2 id=generated-columns-on-the-fly><a href=#generated-columns-on-the-fly>Generated columns on-the-fly</a></h2><p>PostgreSQL 18 introduces virtual generated columns. Previously, generated columns were always stored on disk. This meant for generated columns, values were computed at the time of an insert or update and adding a bit of write overhead.<p>In PostgreSQL 18, virtual generated columns are now the default type for generated columns. if you define a generated column without explicitly specifying STORED, it will be created as a virtual generated column.<pre><code class=language-sql>CREATE TABLE user_profiles (
user_id SERIAL PRIMARY KEY,
settings JSONB,
username VARCHAR(100) GENERATED ALWAYS AS (settings ->> 'username') VIRTUAL
);
</code></pre><p>This is a great update for folks using JSON data, queries can be simplified and data changes or normalization can be done on the fly as needed.<p>Note that virtual generated columns are not indexable - since they’re not stored on disk. For <a href=https://www.crunchydata.com/blog/indexing-jsonb-in-postgres>indexing of JSONB</a>, use the stored version or expression index.<h2 id=oauth-20><a href=#oauth-20>OAUTH 2.0</a></h2><p>Good news for folks that use Okta, Keycloak, and other managed authentication services, Postgres is now compatible with OAUTH 2.0. This is specified in the main host based authentication configuration (pg_hba.conf) file.<p>The Oauth system uses bearer tokens where the client application presents a token instead of a password to prove identity. The token is an opaque string and its format is determined by the authorization server. This feature removes the need to store passwords in the database. It also allows for more robust security measures like multi-factor authentication (MFA) and single sign-on (SSO) to be managed by external identity providers.<h2 id=postgres-versions-are-packed-with-other-improvements><a href=#postgres-versions-are-packed-with-other-improvements>Postgres versions are packed with other improvements</a></h2><p>Postgres 18 comes with a staggering 3,000 commits from more than 200 authors. While many of these are features, there are numerous additions and optimizations under the hood to the Postgres query planner and other parts of the system that are behind the scenes. Even if you don’t utilize optional features, there’s still performance benefits (uh ... asyc i/o is a biggie), bug fixes, and security patches that make upgrading on a regular cadence a good idea. ]]></content:encoded>
<category><![CDATA[ Postgres 18 ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">0fe99b43c2417b308d641253451cc38618f70b171a295266a2dd8108b823f133</guid>
<pubDate>Fri, 12 Sep 2025 08:00:00 EDT</pubDate>
<dc:date>2025-09-12T12:00:00.000Z</dc:date>
<atom:updated>2025-09-12T12:00:00.000Z</atom:updated></item></channel></rss>