<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" version="2.0"><channel><title>CrunchyData Blog</title>
<atom:link href="https://www.crunchydata.com/blog/rss.xml" rel="self" type="application/rss+xml" />
<link>https://www.crunchydata.com/blog</link>
<image><url>https://www.crunchydata.com/card.png</url>
<title>CrunchyData Blog</title>
<link>https://www.crunchydata.com/blog</link>
<width>800</width>
<height>419</height></image>
<description>PostgreSQL experts from Crunchy Data share advice, performance tips, and guides on successfully running PostgreSQL and Kubernetes solutions</description>
<language>en-us</language>
<pubDate>Tue, 20 Jan 2026 08:00:00 EST</pubDate>
<dc:date>2026-01-20T13:00:00.000Z</dc:date>
<dc:language>en-us</dc:language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<item><title><![CDATA[ Postgres Serials Should be BIGINT (and How to Migrate) ]]></title>
<link>https://www.crunchydata.com/blog/postgres-serials-should-be-bigint-and-how-to-migrate</link>
<description><![CDATA[ Postgres 18 defaults to checksums on. This is a good feature for data integrity but might catch you off guard with an upgrade.  ]]></description>
<content:encoded><![CDATA[ <p>Lots of us started with a Postgres database that incremented with an id <code>SERIAL PRIMARY KEY</code>. This was the Postgres standard for many years for data columns that auto incremented. The SERIAL is a shorthand for an integer data type that is automatically incremented. However as your data grows in size, <code>SERIAL</code>s and <code>INT</code>s can run the risk of an integer overflow as they get closer to 2 Billion uses.<p>We covered a lot of this in a blog post <a href=https://www.crunchydata.com/blog/the-integer-at-the-end-of-the-universe-integer-overflow-in-postgres><em>The Integer at the End of the Universe: Integer Overflow in Postgres</em></a> a few years ago. Since that was published we’ve helped a number of customers with this problem and I wanted to refresh the ideas and include some troubleshooting steps that can be helpful. I also think that <code>BIGINT</code> is more cost effective than folks realize.<p><code>SERIAL</code> and <code>BIGSERIAL</code> are just shorthands and map directly to the <code>INT</code> and <code>BIGINT</code> data types. While something like <code>CREATE TABLE user_events (id SERIAL PRIMARY KEY)</code> would have been common in the past, the best practice now is <code>BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY</code> is recommended. <code>SERIAL</code>/ <code>BIGSERIAL</code> are not SQL standard and the <code>GENERATED ALWAYS</code> keyword prevents accidental inserts, guaranteeing the database manages the sequence instead of a manual or application based addition.<ul><li><code>INT</code> - goes up to 2.1 Billion (2,147,483,647) and more if you do negative numbers. INT takes up 4 bytes per row column.<li><code>BIGINT</code>- goes up 9.22 quintillion (9,223,372,036,854,775,807) and needs a 8-bytes for storage.</ul><p><strong>Serials vs UUID</strong><p>Before I continue talking about serials in Postgres, it is worth noting that Postgres also has robust <a href=https://www.crunchydata.com/blog/get-excited-about-postgres-18#uuid-v7>UUID support</a>, including v7 which was just released. If you decide to go with <code>UUID</code>, great. This makes a ton of sense for things that can be URLs or are across systems. However not all ids need to be UUIDs, so lots of folks still continue with a serialized / incremented integers.<h2 id=cost-difference-between-int-and-bigint><a href=#cost-difference-between-int-and-bigint>Cost difference between INT and BIGINT</a></h2><p>Postgres does not pack data tightly like a text file. It writes data in aligned tuples /  rows, and standard 64-bit servers require data to line up on 8-byte boundaries. In many table layouts, <code>INT</code> and <code>BIGINT</code> consume the exact same amount of disk space. The "savings" of <code>INT</code> are often eaten by empty padding bytes.<p>Think of this sample table:<p><code>INT</code><ul><li>Header: 24 bytes (Standard row overhead)<li>Data: 4 bytes (INT)<li>Padding: PostgreSQL adds 4 empty bytes to fill the gap so the next row starts on an 8-byte boundary.<li>Total per Row: $24 + 4 + 4 = 32</ul><p><code>BIGINT</code><ul><li>Header: 24 bytes (Standard row overhead)<li>Data: 8 bytes (BIGINT)<li>Padding: 0 bytes (Already perfectly aligned to 8 bytes).<li>Total per Row: $24 + 8 + 0 = 32</ul><p>You pay $0.00 extra for using <code>BIGINT</code>.<p>Even in the scenario where your specific column order does result in a true 4-byte increase per row for <code>BIGINT</code>, the costs are negligible. Let’s say you have 4 extra bytes per row for a billion rows, that’s just ~4 GB. On Crunchy Bridge that’s about .<strong>40 cents a month</strong> (similar on other modern clouds).<p>Using <code>BIGINT</code> instead of <code>INT</code> for a database bound for production sequencing is probably the safer bet if you’re logging anything like timestamps, page hits, or things that will be incrementing to the millions or billions. Avoiding the man hours and cost to do an in-place data type change of this nature is worth it.<h2 id=live-data-type-change-in-postgres---the-atomic-swap><a href=#live-data-type-change-in-postgres---the-atomic-swap>Live data type change in Postgres - the atomic swap</a></h2><p>Ok, let’s say I’ve convinced you to move to <code>BIGINT</code> now. Maybe you’re close to integer wraparound or maybe you’re small enough that you can do this now before it becomes a bigger headache.<p>Changing a production data column type is always tricky business. The data type change needs to be done across millions and billions of rows in production, but:<ul><li>We can’t lock the table<li>We don’t want to take downtime<li>We need to preserve the current increments</ul><p>Luckily our support team helps folks often with these types of changes and with this blog I’ve collected notes and helpful tips over dozens of these projects for this blog post.<p>The foundational strategy for this migration is to perform the bulk of the work asynchronously—while the application remains online—by creating a new <code>BIGINT</code> column, backfilling the data, and then performing a quick, single-transaction switchover. We like to call this changeover an atomic swap. Atomic swap is a specific technique used to switch a live table with a new version of itself without taking the application offline<p>Here is the high-level plan:<ol><li>Add a new <code>BIGINT</code> column, sequence, and a unique index. Backfill the old id values into the new column in batches.<li>Changeover (Brief Downtime)<strong>:</strong> Lock the table, complete the final backfill, drop old constraints, rename columns (<code>id</code> to <code>id_old</code>, <code>id_new</code> to <code>id</code>), and add a non-validated <code>NOT NULL</code> constraint.<li>Validate the <code>NOT NULL</code> constraint, promote the column to a Primary Key, and clean up.</ol><h3 id=set-up-the-test-environment><a href=#set-up-the-test-environment>Set Up the Test Environment</a></h3><p>I’ll provide some sample code for doing a full <code>INT</code> to <code>BIGINT</code> changever. This will make more sense with a sample table that mimics a real-world scenario where the <code>SERIAL</code> primary key is the bottleneck. I’ve also added steps for a foreign key constraint because we see this frequently.<pre><code class=language-sql>-- 1. Create the Parent Table (Standard SERIAL / INT)
CREATE TABLE user_events (
    id SERIAL PRIMARY KEY,
    data TEXT,
    created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now()
);

-- 2. Create a Child Table (Foreign Key Dependency)
CREATE TABLE user_events_log (
    log_id SERIAL PRIMARY KEY,
    event_id INTEGER NOT NULL,
    log_message TEXT,
    created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT now(),
    CONSTRAINT fk_user_events
        FOREIGN KEY (event_id)
        REFERENCES user_events (id)
);

-- 3. Populate Initial Data (100k rows)
INSERT INTO user_events (data, created_at)
SELECT 'Historical Data', NOW() - (random() * (interval '90 days'))
FROM generate_series(1, 100000);

INSERT INTO user_events_log (event_id, log_message)
SELECT id, 'Log entry for event ' || id
FROM user_events;

-- We start inserting rows in the background to prove the migration is "Online".  (You may need to configure pg_cron in your environment for this to work.)
CREATE EXTENSION IF NOT EXISTS pg_cron;

SELECT cron.schedule(
    'generate-events-traffic',
    '2 seconds', -- Runs every 2 seconds
    $$
    INSERT INTO user_events (data, created_at)
    SELECT 'Live Incoming Traffic', NOW()
    FROM generate_series(1, 1000);
    $$
);
</code></pre><h3 id=add-the-new-bigint-columns><a href=#add-the-new-bigint-columns>Add the New BIGINT Columns</a></h3><p>We add the column allowing NULLs. Later when we create the primary key index, NULLs will not be allowed. This is a quick metadata change even to a large table. It does take a short lock on the table, but only for a tiny blip because we’re creating a new column.<pre><code class=language-sql>ALTER TABLE user_events ADD COLUMN id_new BIGINT;
ALTER TABLE user_events_log ADD COLUMN event_id_new BIGINT;
</code></pre><p>If you’re doing the full test, a trigger ensures any new rows inserted into this table will get their 'id_new' field populated automatically.<pre><code class=language-sql>CREATE OR REPLACE FUNCTION sync_id_new()
RETURNS TRIGGER AS $$
BEGIN
    NEW.id_new := NEW.id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_sync_id_new
BEFORE INSERT ON user_events
FOR EACH ROW
EXECUTE FUNCTION sync_id_new();
</code></pre><h3 id=backfill-in-batches><a href=#backfill-in-batches>Backfill in batches</a></h3><p>Now we can backfill the new column from the old one. We’ll do this in batches to avoid a massive transaction that could cause replication lag or I/O spikes.  We write this as a PROCEDURE to allow us to specify the batch size and sleep time, also allowing it to COMMIT between batches.<pre><code class=language-sql>CREATE OR REPLACE PROCEDURE backfill_id_new(batch_size INTEGER, sleep_time FLOAT)
AS $$
DECLARE
    rows_updated BIGINT := 0;
    max_id_to_process BIGINT;
BEGIN
    -- We define a "high water mark" so we don't chase the moving target forever.
    -- We know any rows higher than this value will not need to be backfilled.

    SELECT MAX(id) INTO max_id_to_process FROM user_events;

    LOOP
        WITH rows_to_update AS (
            SELECT id
            FROM user_events
            WHERE id_new IS NULL
            AND id &#60= max_id_to_process
            LIMIT batch_size
            FOR UPDATE SKIP LOCKED
        )
        UPDATE user_events m
        SET id_new = r.id
        FROM rows_to_update r
        WHERE m.id = r.id;

        GET DIAGNOSTICS rows_updated = ROW_COUNT;

        COMMIT;

        EXIT WHEN rows_updated = 0;

        PERFORM pg_sleep(sleep_time);
    END LOOP;
END;
$$ LANGUAGE plpgsql;

-- Run Backfill on the main table
CALL backfill_id_new(1000, 0.5);

-- Backfill Child Table (We are using a simple update for our test script, in
-- practice you would use the same batching approach in prod)
UPDATE user_events_log SET event_id_new = event_id;
</code></pre><h3 id=batch-vacuum><a href=#batch-vacuum>Batch vacuum</a></h3><p>For a small test like this one, you don’t need to vacuum but as we’ve found with larger production moves, regularly running <code>VACUUM</code> is crucial during the backfill process to clean up dead rows created by the <code>UPDATE</code> statements. The old rows need to be cleaned up since we have new rows that have both INT and BIGINT and this cleanup prevents table bloat.<pre><code class=language-sql>-- Run this command after every 5-10 backfill batches (e.g., every 500,000 rows)
VACUUM (ANALYZE, VERBOSE) user_events;
VACUUM (ANALYZE, VERBOSE) user_events_log;
</code></pre><p>You may need to play around with your batch size. Instead of using very large batch sizes (e.g., 4 million rows), stick to a smaller, efficient size (like 100,000 rows). The overall time for a smaller batch <em>plus</em> a vacuum proved to be more efficient than a single massive batch followed by a prolonged vacuum.<h3 id=create-an-index-concurrently><a href=#create-an-index-concurrently>Create an index concurrently</a></h3><p>We can now create the necessary unique index, which will eventually enforce the primary key constraint. Using <code>CONCURRENTLY</code> is the key to maintaining uptime.<pre><code class=language-sql>-- 1. Ensure all backfilled rows are NOT NULL for the index creation
ALTER TABLE user_events
ALTER COLUMN id_new SET NOT NULL;

-- 2. Create the unique index CONCURRENTLY (non-locking DML)
CREATE UNIQUE INDEX CONCURRENTLY user_events_id_new_idx ON user_events (id_new);
</code></pre><h3 id=final-catch-up-and-sequence-configuration><a href=#final-catch-up-and-sequence-configuration>Final catch-up and sequence configuration</a></h3><p>Before the final swap, we perform a quick update on any rows inserted since the initial backfill and configure the sequence to start from the highest existing ID.<pre><code class=language-sql>-- 1. Catch-up: Update any rows that were inserted during the batch backfill
-- This should be fast, as it only targets newly inserted rows (id_new IS NULL)
UPDATE user_events
SET id_new = id
WHERE id_new IS NULL;

-- 2. Get the new sequence ready to continue from the largest existing ID
-- SERIAL uses an underlying sequence. We rename it to use for IDENTITY
ALTER SEQUENCE user_events_id_seq RENAME TO user_events_id_identity_seq;

-- Set the sequence to the current max value of the old ID (plus a buffer, e.g., 1000)
SELECT setval('user_events_id_identity_seq', (SELECT MAX(id) FROM user_events) + 1000, false);
</code></pre><h3 id=updating-foreign-key-columns-on-the-child-table><a href=#updating-foreign-key-columns-on-the-child-table>Updating foreign key columns on the child table</a></h3><p>Any table that has a foreign key referencing the primary table's ID column must be updated to <code>BIGINT</code> before the main table's switchover is completed. 🫠<p>This process is simpler as these columns are not primary keys, but it still requires a process of adding a new <code>BIGINT</code> foreign key column, backfilling, and performing a quick rename switchover for each referencing table. The trick here is adding the constraint as <code>NOT VALID</code> and making it valid later.<pre><code class=language-sql>-- 1. Enforce NOT NULL on Parent
ALTER TABLE user_events ALTER COLUMN id_new SET NOT NULL;

-- 2. Create Unique Index Concurrently, this prepares the future Primary Key without locking writes)
CREATE UNIQUE INDEX CONCURRENTLY user_events_id_new_idx ON user_events (id_new);

-- 3. Add Foreign Key Constraint to Child (NOT VALID)
ALTER TABLE user_events_log
    ADD CONSTRAINT fk_user_events_new
    FOREIGN KEY (event_id_new)
    REFERENCES user_events (id_new)
    NOT VALID;

-- 4. Validate FK (Scans table, but does not block parent updates)
ALTER TABLE user_events_log VALIDATE CONSTRAINT fk_user_events_new;

-- Done before the parent swap. Brief exclusive lock on child table only.

BEGIN;
    LOCK TABLE user_events_log IN ACCESS EXCLUSIVE MODE;

    -- Drop old FK and column
    ALTER TABLE user_events_log DROP CONSTRAINT fk_user_events;
    ALTER TABLE user_events_log DROP COLUMN event_id;

    -- Rename new column/constraint to match old names
    ALTER TABLE user_events_log RENAME COLUMN event_id_new TO event_id;
    ALTER TABLE user_events_log RENAME CONSTRAINT fk_user_events_new TO fk_user_events;
COMMIT;
</code></pre><h3 id=the-atomic-swap-brief-lock><a href=#the-atomic-swap-brief-lock>The atomic swap (brief lock)</a></h3><p>If you followed along for the sake of testing, stop your cron job <code>SELECT cron.unschedule('generate-events-traffic');</code>.<p>This is the final step, done inside a single transaction. It requires an exclusive lock, but since the index is already built, this step is purely metadata and should take milliseconds.<pre><code class=language-sql>BEGIN;
    LOCK TABLE user_events IN ACCESS EXCLUSIVE MODE;

    -- Drop the Sync Trigger (We don't need it after the swap)
    DROP TRIGGER trg_sync_id_new ON user_events;
    DROP FUNCTION sync_id_new;

    -- Drop old PK constraint
    ALTER TABLE user_events DROP CONSTRAINT user_events_pkey;

    -- Make the new column active, drop old one
    ALTER TABLE user_events DROP COLUMN id;
    ALTER TABLE user_events RENAME COLUMN id_new TO id;

    -- Add IDENTITY (Creates a fresh sequence automatically)
    ALTER TABLE user_events
    ALTER COLUMN id ADD GENERATED ALWAYS AS IDENTITY;

    -- Sync the new Sequence to the Data
    SELECT setval(pg_get_serial_sequence('user_events', 'id'), (SELECT MAX(id) FROM user_events));

    -- Re-add Primary Key (Using the pre-built index)
    -- Postgres will automatically rename the index 'user_events_id_new_idx' to 'user_events_pkey'
    ALTER TABLE user_events
    ADD CONSTRAINT user_events_pkey PRIMARY KEY USING INDEX user_events_id_new_idx;

COMMIT;
</code></pre><h2 id=conclusion><a href=#conclusion>Conclusion</a></h2><p>BIGINT is cheap! You might want to do a migration soon.<p>Migrating a sequencing column from <code>INT</code> to <code>BIGINT</code> is a complex database refactoring project, but by utilizing Postgres features like unique indexes, sequences, and the <code>NOT VALID</code> constraint trick, it can be executed with minimal application downtime.<p>Key Takeaways:<ul><li>Do as much work as possible (adding new column, index, backfilling) while the application is online.<li>Test batch sizes and vacuum to get to a backfill process that is efficient<li>Update referencing foreign key columns to BIGINT <em>before</em> the main table switch.<li>Atomic switchover: Execute the column rename and constraint setup in a single, quick transaction.</ul><p>As always, test the entire process on a non-production fork and ensure the plan works as expected before committing to production. ]]></content:encoded>
<category><![CDATA[ Production Postgres ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">451deab9e40fcfdba428e5a2ab9dde922c5a819d70e38f6ee9224c974df80c3f</guid>
<pubDate>Tue, 20 Jan 2026 08:00:00 EST</pubDate>
<dc:date>2026-01-20T13:00:00.000Z</dc:date>
<atom:updated>2026-01-20T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres 18 New Default for Data Checksums and How to Deal with Upgrades ]]></title>
<link>https://www.crunchydata.com/blog/postgres-18-new-default-for-data-checksums-and-how-to-deal-with-upgrades</link>
<description><![CDATA[ Postgres 18 defaults to checksums on. This is a good feature for data integrity but might catch you off guard with an upgrade.  ]]></description>
<content:encoded><![CDATA[ <p>In a recent Postgres <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=04bec894a04">patch</a> authored by Greg Sabino Mullane, Postgres has a new step forward for data integrity: <strong>data checksums are now enabled by default.</strong><p>This appears in the release notes as a fairly minor change but it significantly boosts the defense against one of the sneakiest problems in data management - silent data corruption.<p>Let’s dive into what this feature is, what the new default means for you, and how it impacts upgrades.<h2 id=what-is-a-data-checksum><a href=#what-is-a-data-checksum>What is a data checksum?</a></h2><p>A data checksum is a simple but powerful technique to verify the integrity of data pages stored on disk. It's like a digital fingerprint for every 8KB block of data (a "page") in your database.<ul><li><strong>Creation:</strong> When Postgres writes a data page (table and indexes) to disk, it runs an algorithm on the page's contents to calculate a derived, small value—the <strong>checksum</strong>.<li><strong>Storage:</strong> This checksum is stored in the page header alongside the data.<li><strong>Verification:</strong> Whenever Postgres reads that page back from disk, it immediately recalculates the checksum from the data and compares it to the stored value.</ul><p>If the two values do not match, it means the data page has been altered or corrupted since it was last written. This is important because data corruption can happen <em>silently.</em> By detecting a mismatch, Postgres can immediately raise an error and alert you to a potential problem. Checksums are also an integral part of <a href=https://github.com/pgbackrest/pgbackrest>pgBackRest</a> which uses these checksums to verify backups.<h2 id=what-is-initdb-and-why-does-it-matter><a href=#what-is-initdb-and-why-does-it-matter>What is initdb and why does it matter?</a></h2><p>The <code>initdb</code> command in Postgres is the utility used to create a new Postgres database cluster and initializes the data directory where Postgres stores all the permanent data. When you run initdb, it does things like:<ol><li>create the directory structure<li>create the template databases like <code>template1</code> and <code>postgres</code><li>populate the initial system catalog tables<li>create the initial version of the server configuration files<li>enable and start keeping track of checkums</ol><p>The syntax often looks something like this:<pre><code class=language-bash>/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
</code></pre><p>As an end user who uses cloud managed Postgres or even a local tool like Postgres.app, you generally never see the <code>initdb</code> command because it is a one-time administrative setup task.<h2 id=the-new-default---data-checksums-for-initdb><a href=#the-new-default---data-checksums-for-initdb>The new default <code>--data-checksums</code> for initdb</a></h2><p>In the past database admins had to manually add the <code>--data-checksums</code> flag when running initdb to enable this feature. If you forgot or didn’t know about this feature, the new cluster was created without these built-in integrity checks.<p>The default behavior of initdb is now to <strong>enable data checksums</strong> every time Postgres is initiated.<ul><li>old command - checksums OFF by default: <code>initdb -D /data/pg14</code><li>new default command - checksums ON by default: <code>initdb -D /data/pg18</code></ul><p>This is generally a win for Postgres best practices. Every new database cluster is now automatically equipped with this corruption defense, requiring no extra effort.<h3 id=--no-data-checksums><a href=#--no-data-checksums><code>--no-data-checksums</code></a></h3><p>You might have a very specific reason to disable checksums and you can explicitly opt out using the new flag:<pre><code class=language-sbash>initdb --no-data-checksums -D /data/pg18
</code></pre><h2 id=checksums-and-pg_upgrade><a href=#checksums-and-pg_upgrade>Checksums and <code>pg_upgrade</code></a></h2><p>While the new default is great, it may introduce a compatibility issue for those doing a major version upgrade using the <code>pg_upgrade</code> utility.<p>pg_upgrade works by connecting an old data directory to a new data directory and a fundamental requirement is that both clusters must have the same checksum setting—either both ON or both OFF.<p>If you are upgrading an older Postgres cluster that was created before this change, chances are it has checksums disabled and pg_upgrade will fail because the settings mismatch.<p>In an upgrade pinch, to upgrade a non-checksum-enabled cluster, you can use the new <code>--no-data-checksums</code> flag when initializing the new cluster to make the settings align.<h3 id=upgrading-an-existing-postgres-database-to-checksums><a href=#upgrading-an-existing-postgres-database-to-checksums>Upgrading an existing Postgres database to checksums</a></h3><p>Instead of continuing forever with no data checksums, the better long term solution is to add checksums to your database before the next upgrade. Sadly, there’s really no way to do this without some downtime and a restart. Adding checksums to an existing database can be a slow process with a large database. There’s a <a href=https://www.crunchydata.com/blog/fun-with-pg_checksums>pg_checksums utility</a> to help with this which is well documented.<p>We have helped a few folks with this issue. For larger no-downtime environments, you can add the checkums on a replica machine and then fail over to that.<h2 id=summary><a href=#summary>Summary</a></h2><p>Postgres checksums are a great feature - and will be the default in the future. If you haven’t used checksums in the past, you may want to start planning now for adding them, especially since a self managed major version upgrade will require a bit of extra thinking. ]]></content:encoded>
<category><![CDATA[ Postgres 18 ]]></category>
<author><![CDATA[ Greg.Sabino.Mullane@crunchydata.com (Greg Sabino Mullane) ]]></author>
<dc:creator><![CDATA[ Greg Sabino Mullane ]]></dc:creator>
<guid isPermalink="false">fa1787ed297110b99885d60008c312cb0ebd13f901f1167f84a7af3a4dcf9755</guid>
<pubDate>Thu, 11 Dec 2025 08:00:00 EST</pubDate>
<dc:date>2025-12-11T13:00:00.000Z</dc:date>
<atom:updated>2025-12-11T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ PostGIS Performance: Simplification ]]></title>
<link>https://www.crunchydata.com/blog/postgis-performance-simplification</link>
<description><![CDATA[ Slim down the size of geometries with ST_Simplify. Also learn about ST_SimplifyVW, ST_RemoveRepeatedPoints, ST_SnapToGrid, ST_ReducePrecision, and ST_CoveranceClean to make your PostGIS as snappy as ever. ]]></description>
<content:encoded><![CDATA[ <p>There’s nothing simple about simplification! It is very common to want to slim down the size of geometries, and there are lots of different approaches to the problem.<p>We will explore different methods starting with <a href=https://postgis.net/docs/ST_Letters.html>ST_Letters</a> for this rendering of the letter “a”.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/87b6bf75-85ef-4198-0bc3-6ff8ee860f00/public><pre><code class=language-sql>SELECT ST_Letters('a');
</code></pre><p>This is a good starting point, but to show the different effects of different algorithms on things like redundant linear points, we need a shape with more vertices along the straights, and fewer along the curves.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/7ef69e54-94ed-4c81-2ac0-49b908572f00/public><pre><code class=language-sql>SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1);
</code></pre><p>Here we add in vertices every one meter with <a href=https://postgis.net/docs/ST_Segmentize.html>ST_Segmentize</a> and <a href=https://postgis.net/docs/ST_RemoveRepeatedPoints.html>ST_RemoveRepeatedPoints</a> to thin out the points along the curves. Already we are simplifying!<p>Lets apply the same “remove repeated” algorithm, with a 10 meter tolerance.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/f9915d0e-ee91-4c92-3953-9c6bbd6c3f00/public><pre><code class=language-sql>WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_RemoveRepeatedPoints(a, 10) FROM a;
</code></pre><p>We do have a lot fewer points, and the constant angle curves are well preserved, but some straight lines are no longer legible as such, and there are redundant vertices in the vertical straight lines.<p>The <a href=https://postgis.net/docs/ST_Simplify.html>ST_Simplify</a> function applies the <a href=https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm>Douglas-Peuker</a> line simplification algorithm to the rings of the polygon. Because it is a line simplifier it does a cruder job preserving some aspects of the polygon area like squareness of the top ligature.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/30262e64-b2e3-4972-ca52-5cd0b6cdc100/public><pre><code class=language-sql>WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_Simplify(a, 1) FROM a;
</code></pre><p>The <a href=https://postgis.net/docs/ST_SimplifyVW.html>ST_SimplifyVW</a> function applies the Visvalingam–Whyatt algorithm to the rings of the polygon. Visvalingam–Whyatt is better for preserving the shapes of polygons than Douglas-Peuker, but the differences are subtle.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/3a27cbbf-963f-418e-1e73-48fb92344600/public><pre><code class=language-sql>WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_SimplifyVW(a, 5) FROM a;
</code></pre><p>Coercing a shape onto a fixed precision grid is another form of simplification, sometimes used to force the edges of adjacent objects to line up exactly. The original such function, <a href=https://postgis.net/docs/ST_SnapToGrid.html>ST_SnapToGrid</a>, does exactly what it says on the name. Every vertex is rounded to a fixed grid point.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/c5d2b0aa-8e11-4108-2f81-3b903fc63200/public><pre><code class=language-sql>WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_SnapToGrid(a, 5) FROM a;
</code></pre><p>However, as you can see at the top left, the grid snapper frequently generates invalidity in polygons, such as the self-intersecting ring in this example.<p>A more modern alternative is precision reduction.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/5b44a882-66f6-4009-d2a8-88115dac5200/public><pre><code class=language-sql>WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_ReducePrecision(a, 5) FROM a;
</code></pre><p>The <a href=https://postgis.net/docs/ST_ReducePrecision.html>ST_ReducePrecision</a> function not only snaps geometries to a fixed precision grid, it also ensures that outputs are always valid.<p>Because grid snapping tends to introduce a lot of vertices along straight edges, combining it with a line simplifier makes a lot of sense.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/6b7d8270-4d68-4a95-2afe-9a1f1a7ef000/public><pre><code class=language-sql>WITH a AS (
  SELECT ST_RemoveRepeatedPoints(ST_Segmentize(ST_Letters('a'), 1), 1) AS a
)
SELECT ST_Simplify(ST_ReducePrecision(a, 5),1) FROM a;
</code></pre><p>Simplifying single geometries is all well and good, but what about simplifying groups of geometries? Specifically ones that share boundaries?<p>Fortunately, since PostGIS 3.6 there is now a complete set of functions for that problem.<p>Starting with a pair of polygons with a non-matched shared boundary.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/87cc07bf-1bad-4b6d-9870-108204f06d00/public><p>Non-clean boundaries can be cleaned up with the <a href=https://postgis.net/docs/ST_CoverageClean.html>ST_CoverageClean</a> function.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/28bdf680-61db-4403-f7a7-c8f8df052100/public><pre><code class=language-sql>SELECT ST_CoverageClean OVER() AS geom FROM polys;
</code></pre><p>And once the coverage is clean, the shapes including their shared borders can be simplified with <a href=http://st_coveragesimplify/>ST_CoverageSimplify</a>.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/7c683fde-0aa2-4225-9f40-5a4f3ef0d200/public><pre><code class=language-sql>WITH clean AS (
  SELECT ST_CoverageClean OVER() AS geom FROM polys
)
SELECT ST_CoverageSimplify(geom, 10) OVER() FROM clean
</code></pre> ]]></content:encoded>
<category><![CDATA[ PostGIS Performance ]]></category>
<author><![CDATA[ Paul.Ramsey@crunchydata.com (Paul Ramsey) ]]></author>
<dc:creator><![CDATA[ Paul Ramsey ]]></dc:creator>
<guid isPermalink="false">457287f269c3bd9a2b4a707e0d61a3a311672065875f1932040c593ac774b7b5</guid>
<pubDate>Tue, 09 Dec 2025 08:00:00 EST</pubDate>
<dc:date>2025-12-09T13:00:00.000Z</dc:date>
<atom:updated>2025-12-09T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres Scan Types in EXPLAIN Plans ]]></title>
<link>https://www.crunchydata.com/blog/postgres-scan-types-in-explain-plans</link>
<description><![CDATA[ What is a sequential scan vs index scan vs parallel scan .... and what is a bitmap heap scan? Postgres scan types explained and diagrammed. ]]></description>
<content:encoded><![CDATA[ <p>The secret to unlocking performance gains often lies not just in <em>what</em> you ask in a query, but in <em>how</em> Postgres finds the answer. The Postgres <code>EXPLAIN</code> system is great for understanding how data is being queried. One of secretes to reading EXPLAIN plans is understanding the <strong>type of scan</strong> done to retrieve the data. The scan type can be the difference between a lightning-fast response or a slow query.<p><img alt="postgres explain plan"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/fb4c4eb8-e74c-4f68-8981-76ffbc6be300/public><p>Today I’ll break down the most common scan types, how they work, and when you’ll see them in your queries.<h2 id=sequential-scan><a href=#sequential-scan>Sequential scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/46e6aee0-8119-4a09-14fc-90890cf30e00/public" alt="postgres sequential scan, seq scan" style="float: left; margin: 0 15px 10px 0;"><p>This type of data scan reads the entire table, row by row checking to see what matches the query conditions. If you have a WHERE or FILTER, Postgres just scans each row looking for matches.<p>Sequence scans are kind of the foundation of how scans are done and for many searches, this is what Postgres will use. For very large data sets, or those queried often, sequential scans are not ideal and an index scan may be faster. For that reason - knowing how to spot a seq scan vs index scan when reading an <code>EXPLAIN</code> plan is one the most important parts of reading a scan type in a query plan.<pre><code class=language-sql>EXPLAIN select * from accounts;

QUERY PLAN
-------------------------------------------------------------
Seq Scan on accounts  (cost=0.00..22.70 rows=1270 width=36)
(1 row)
</code></pre><p><br><br><br><br><br><h2 id=index-scan><a href=#index-scan>Index Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/ba98105a-7268-4cf4-e2cc-0b12eec5ee00/public" alt="postgres index scan" style="float: left; margin: 0 15px 10px 0;"><p>When you create an index in Postgres, you’re creating a column or multi-column reference that is stored on disk. Postgres is able to use this index as a map to the data stored in the table. A basic index scan uses a B-tree to quickly find the exact location of the data using a a two-step process: first Postgres finds the entry in the index, uses the reference, and then it fetches the rest of the row data from the table.<pre><code class=language-sql>EXPLAIN select * from accounts where id = '5';

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using accounts_pkey on accounts  (cost=0.15..2.37 rows=1 width=36)
   Index Cond: (id = 5)
(2 rows)
</code></pre><p>Note that primary keys are automatically indexed with a b-tree index, so queries that involve a primary key may use an index scan.<p>An index scan is typically faster than a sequential scan in Postgres when a query needs to retrieve only a very small fraction of rows from a large table. Using the index is faster than scanning the whole table.<p>However, index scans are <strong>not</strong> always faster. In many situations, Postgres’ query planner will correctly choose a sequential scan. This is typically for cases when the table being scanned is small or the percentage of rows returned outweighs using an index. If a query returns ~10%, a sequential scan is probably faster. <br><br><br><h2 id=bitmap-index-scan><a href=#bitmap-index-scan>Bitmap Index Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/b2489008-9acf-4c48-2ad5-d570e3848800/public" alt="bitmap index scan" style="float: left; margin: 0 15px 10px 0;"><p>If an index scan or a seq scan aren’t the perfect option, Postgres can use the the bitmap index scan as a kind of hybrid approach. It is typically chosen when a query matches too many rows for an regular index scan, but not so many that a sequential scan would be the best option.<p>This shows up in an EXPLAIN plan as a two-phased approach.<ol><li><strong>Bitmap Index Scan:</strong> First, Postgres scans one or more indexes to create an in-memory "bitmap", a simple map of all the table pages that <em>might</em> contain rows you need.<li><strong>Bitmap Heap Scan:</strong> The bitmap is used to visit the main table. The key here is that it reads the required pages from the disk sequentially, which can be much faster than the random jumping of a standard index scan.</ol><p>Bitmap index scans are common when a query has multiple filter conditions that each have a separate index. The bitmap scan allows the database to use separate indexes on different columns simultaneously. You’ll see this scan come up with <code>WHERE</code> conditions joined by <code>AND</code> or <code>OR</code> operators.<pre><code class=language-sql>EXPLAIN SELECT customer_id, registration_date
FROM customer_records
WHERE gender = 'F'
  AND state_code = 'KS';
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on customer_records  (cost=835.78..8669.29 rows=49226 width=12) (actual time=5.717..38.642 rows=50184.00 loops=1)
   Recheck Cond: (state_code = 'NY'::bpchar)
   Filter: (gender = 'F'::bpchar)
   Rows Removed by Filter: 49682
   Heap Blocks: exact=6370
   Buffers: shared hit=6370 read=87
   ->  Bitmap Index Scan on idx_customer_state  (cost=0.00..823.48 rows=97567 width=0) (actual time=4.377..4.378 rows=99866.00 loops=1)
         Index Cond: (state_code = 'NY'::bpchar)
         Index Searches: 1
         Buffers: shared read=87
 Planning:
   Buffers: shared hit=27 read=2
 Planning Time: 0.774 ms
 Execution Time: 40.572 ms
(14 rows)
</code></pre><p><br><br><br><h2 id=parallel-sequential-scan><a href=#parallel-sequential-scan>Parallel Sequential Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/7375434b-f942-4cd9-bf9a-07a775df9600/public" alt="parallel seq scan" style="float: left; margin: 0 15px 10px 0;"><p>You will see a parallel sequential scan when Postgres uses multiple background workers to perform more than one sequential scan on a single large table <em>at the same time</em>. The table is broken into chunks, and each worker gets a chunk to scan, and the results are combined at the end in a gather process. Depending on your query - you may also have an aggregate or sort after the parallel queries and before the final gather. This is part of <a href=https://www.crunchydata.com/blog/parallel-queries-in-postgres>Postgres’ parallel query function</a>.<pre><code class=language-sql>EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT id, data_value
FROM parallel_test
WHERE data_value &#60 100000
ORDER BY data_value DESC
LIMIT 1000;

                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=161310.11..161431.04 rows=1000 width=16) (actual time=130.300..140.555 rows=1000.00 loops=1)
   Output: id, data_value
   Buffers: shared hit=142685
   ->  Gather Merge  (cost=161310.11..220311.14 rows=487915 width=16) (actual time=130.299..140.468 rows=1000.00 loops=1)
         Output: id, data_value
         Workers Planned: 5
         Workers Launched: 5
         Buffers: shared hit=142685
         ->  Sort  (cost=160310.04..160553.99 rows=97583 width=16) (actual time=112.942..112.973 rows=861.17 loops=6)
               Output: id, data_value
               Sort Key: parallel_test.data_value DESC
               Sort Method: top-N heapsort  Memory: 163kB
               Buffers: shared hit=142685
               Worker 0:  actual time=112.535..112.571 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=21729
               Worker 1:  actual time=112.271..112.308 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=21573
               Worker 2:  actual time=112.465..112.500 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 164kB
                 Buffers: shared hit=20549
               Worker 3:  actual time=99.099..99.133 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 163kB
                 Buffers: shared hit=17033
               Worker 4:  actual time=112.333..112.368 rows=1000.00 loops=1
                 Sort Method: top-N heapsort  Memory: 163kB
                 Buffers: shared hit=19964
               ->  Parallel Seq Scan on public.parallel_test  (cost=0.00..154959.67 rows=97583 width=16) (actual time=19.238..99.868 rows=83250.83 loops=6)
                     Output: id, data_value
                     Filter: (parallel_test.data_value &#60 '100000'::numeric)
                     Rows Removed by Filter: 750082
                     Buffers: shared hit=142500
                     Worker 0:  actual time=18.837..99.169 rows=83026.00 loops=1
                       Buffers: shared hit=21692
                     Worker 1:  actual time=18.594..99.301 rows=84378.00 loops=1
                       Buffers: shared hit=21536
                     Worker 2:  actual time=18.706..99.551 rows=79196.00 loops=1
                       Buffers: shared hit=20512
                     Worker 3:  actual time=5.308..86.023 rows=81187.00 loops=1
                       Buffers: shared hit=16996
                     Worker 4:  actual time=18.694..99.497 rows=83574.00 loops=1
                       Buffers: shared hit=19927
 Planning:
   Buffers: shared hit=15
 Planning Time: 0.315 ms
 Execution Time: 140.635 ms
(47 rows)
</code></pre><p><br><br><br><h2 id=parallel-index-scan><a href=#parallel-index-scan>Parallel index scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/a406e8b5-b14f-47d3-4fcf-32f9a236c300/public" alt="parallel index scan" style="float: left; margin: 0 15px 10px 0;"><p>A parallel index scan uses the same parallel workers to scan through an index concurrently. This uses the same methodology of the index scan - except that multiple workers are doing it simultaneously. Each process reads a different part of the index and returns results. Like the other parallel scans, this ends in a gather.<p>You will see a parallel index scan done when the indexes and tables involved are very large - and the overall operation to split things up and gather them at the end is faster than handing the job to a single worker.<pre><code class=language-sql>EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT data_id, filler_text
FROM parallel_index_test
WHERE data_id BETWEEN 1000000 AND 2000000;

                                                                                QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.43..34560.34 rows=995971 width=109) (actual time=1.014..145.796 rows=1000001.00 loops=1)
   Output: data_id, filler_text
   Workers Planned: 4
   Workers Launched: 4
   Buffers: shared hit=23385
   ->  Parallel Index Scan using idx_data_id on public.parallel_index_test  (cost=0.43..33564.37 rows=248993 width=109) (actual time=0.941..38.211 rows=200000.20 loops=5)
         Output: data_id, filler_text
         Index Cond: ((parallel_index_test.data_id >= 1000000) AND (parallel_index_test.data_id &#60= 2000000))
         Index Searches: 1
         Buffers: shared hit=23385
         Worker 0:  actual time=2.104..45.540 rows=240638.00 loops=1
           Buffers: shared hit=5640
         Worker 1:  actual time=2.174..45.169 rows=240096.00 loops=1
           Buffers: shared hit=5638
         Worker 2:  actual time=0.067..45.380 rows=242658.00 loops=1
           Buffers: shared hit=5693
         Worker 3:  actual time=0.306..45.122 rows=242292.00 loops=1
           Buffers: shared hit=5686
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.526 ms
 Execution Time: 180.660 ms
(22 rows)
</code></pre><p><br><br><br><h2 id=index-only-scan><a href=#index-only-scan>Index-Only Scan</a></h2><img src="https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/d561c4a4-5a31-4982-c7c5-9f322a327100/public" alt="postgres index only scan" style="float: left; margin: 0 15px 10px 0;"><p>An Index-Only Scan is the superstar of scans and answers the entire query using <em>only</em> the information stored within the index itself. Index only scans are also called “covering indexes” meaning the index itself covers all the data. It never even has to touch the main table. Index only scans are a huge performance win because they’re very fast - no information needs to be retrieved from the heap table. They also typically use less i/o resources because indexes are very cache friendly and often in shared buffers - meaning no data needs to be read for the underlying disk.<p>Queries benefit from a covering index in these situations:<ul><li>The query is very frequently executed.<li>The current query is performing a standard index scan followed by many slow disk reads (heap fetches) and using i/o.<li>The query only requires a small subset of the table's columns, for example you select only three columns from a table of twenty.<li>The columns have a low write frequency. Any column that is indexed must be written to disk and the index, so if you start adding covering indexes for all your columns - you’re essentially creating write amplification.<li>The new index, which must cover all needed columns, won't be excessively large. Indexes are stored on disk so you don’t want to cause storage issues.</ul><pre><code class=language-sql>EXPLAIN (ANALYZE, VERBOSE, BUFFERS)
SELECT code, status
FROM index_only_test
WHERE code > 'CODE_050000'
ORDER BY code
LIMIT 100;
                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..2.60 rows=100 width=13) (actual time=0.346..0.362 rows=100.00 loops=1)
   Output: code, status
   Buffers: shared hit=1 read=3
   ->  Index Only Scan using idx_code_status on public.index_only_test  (cost=0.42..1068.02 rows=49000 width=13) (actual time=0.345..0.352 rows=100.00 loops=1)
         Output: code, status
         Index Cond: (index_only_test.code > 'CODE_050000'::text)
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=1 read=3
 Planning:
   Buffers: shared hit=19
 Planning Time: 1.838 ms
 Execution Time: 0.385 ms
(13 rows)
</code></pre><p><br><br><br><h2 id=summary><a href=#summary>Summary</a></h2><p>We’ve covered all the major scan types so now reading your <code>EXPLAIN</code> plans will be a little easier.<ul><li>Seq scan - Postgres looks through the whole table in sequential order to find the query data<li>Index scan - Postgres first looks at the index and then fetches the row data the index pointed to<li>Bitmap index scan - Postgres first read the index and created a <strong>bitmap</strong> list matching rows. Second, Postgres read the data heap using the bitmap in a more efficient method than a sequential scan.<li>Parallel scan - Postgres used multiple parallel workers to scan the table and data was gathered at the end<li>Parallel index scan - Postgres used multiple workers to do an index scan and data was gathered at the end<li>Index only scan- All data for the query was in the index</ul><p>And here’s everything all in one graphic:<p><img alt="postgres index only scan"loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/e886cea5-4785-4136-ba99-ff46a3b03000/original> ]]></content:encoded>
<category><![CDATA[ Production Postgres ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">08ee92c4d2dfc4ad6be90a57493965c9cdb5a8e3c06cedc4fd8eddfb425c08c9</guid>
<pubDate>Thu, 04 Dec 2025 08:00:00 EST</pubDate>
<dc:date>2025-12-04T13:00:00.000Z</dc:date>
<atom:updated>2025-12-04T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ PostGIS Performance: Data Sampling ]]></title>
<link>https://www.crunchydata.com/blog/postgis-performance-data-sampling</link>
<description><![CDATA[ Paul shows off some tricks for sampling data, instead of querying everything. This works for regular Postgres queries too! ]]></description>
<content:encoded><![CDATA[ <p>One of the temptations database users face, when presented with a huge table of interesting data, is to run queries that interrogate every record. Got a billion measurements? What’s the average of that?!<p>One way to find out is to just calculate the average.<pre><code class=language-sql>SELECT avg(value) FROM mytable;
</code></pre><p>For a billion records, that could take a while!<p>Fortunately, the “Law of Large Numbers” is here to bail us out, stating that the average of a sample approaches the average of the population, as the sample size grows. And amazingly, the sample does not even have to be particularly large to be quite close.<p>Here’s a table of 10M values, randomly generated from a normal distribution. We know the average is zero. What will a sample of 10K values tell us it is?<pre><code class=language-sql>CREATE TABLE normal AS
  SELECT random_normal(0,1) AS values
    FROM generate_series(1,10000000);
</code></pre><p>We can take a sample using a sort, or using the <code>random()</code> function, but both of those techniques first scan the whole table, which is exactly what we want to avoid.<p>Instead, we can use the PostgreSQL <code>TABLESAMPLE</code> feature, to get a quick sample of the pages in the table and an estimate of the average.<pre><code class=language-sql>SELECT avg(values)
  FROM normal TABLESAMPLE SYSTEM (1);
</code></pre><p>I get an answer – 0.0031, very close to the population average – and it takes just 43 milliseconds.<p>Can this work with spatial? For the right data, it can. Imagine you had a table that had one point in it for every person in Canada (36 million of them) and you wanted to find out how many people lived in Toronto (or this red circle around Toronto).<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/97df422e-dc48-43c4-4976-a76dac474100/public><pre><code class=language-sql>SELECT count(*)
  FROM census_people
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);
</code></pre><p>The answer is 5,010,266, and it takes 7.2 seconds to return. What if we take a 10% sample?<pre><code class=language-sql>SELECT count(*)
  FROM census_people TABLESAMPLE SYSTEM (10)
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);
</code></pre><p>The sample is 10%, and the answer comes back as 508,292 (near one tenth of our actual measurement) in 2.2 seconds. What about a 1% sample?<pre><code class=language-sql>SELECT count(*)
  FROM census_people TABLESAMPLE SYSTEM (1)
  JOIN yyz
    ON ST_Intersects(yyz.geom, census_people.geom);
</code></pre><p>The sample is 1%, and the answer comes back as 50,379 (near one hundredth of our actual measurement) in 0.2 seconds. Still a good estimate!<p>Is this black magic? No, the <code>TABLESAMPLE SYSTEM</code> mode gets its speed by reading pages randomly. In our last example, it randomly chose 1% of the pages. Here’s what that looks like in Toronto.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/5aecb50d-6b63-453b-3d96-0c2d55d82400/public><p>See in particular how blotchy the data are in the suburban areas outside the circle. The data in the table are not randomly distributed to the pages, they came from the census data in order, and ended up loaded into the database in order. So for any given database page, the actual rows in the page will tend to be near to one another.<p>This works for this example because the amount of data is high, and the area we are summarizing is a large proportion of the total data – a seventh of the Canadian population lives in that circle.<p>If we were summarizing a smaller area, the results would not have been so good.<p>The <code>TABLESAMPLE SYSTEM</code> is a powerful tool, but <strong>you have to be sure that any given page has a random selection of the data you are sampling for</strong>. Our random normal example worked perfectly, because the data were perfectly random. A sample of time series data would not work well for sample time windows (the data were probably stored in order of arrival) but might work for sampling some other value. ]]></content:encoded>
<category><![CDATA[ PostGIS Performance ]]></category>
<author><![CDATA[ Paul.Ramsey@crunchydata.com (Paul Ramsey) ]]></author>
<dc:creator><![CDATA[ Paul Ramsey ]]></dc:creator>
<guid isPermalink="false">e72f061428ac799d9d20d237d604aff51c0c0fa58b180bbff4ee094e412d0245</guid>
<pubDate>Fri, 21 Nov 2025 08:00:00 EST</pubDate>
<dc:date>2025-11-21T13:00:00.000Z</dc:date>
<atom:updated>2025-11-21T13:00:00.000Z</atom:updated></item></channel></rss>