<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" version="2.0"><channel><title>CrunchyData Blog</title>
<atom:link href="https://www.crunchydata.com/blog/topic/postgres-18/rss.xml" rel="self" type="application/rss+xml" />
<link>https://www.crunchydata.com/blog/topic/postgres-18</link>
<image><url>https://www.crunchydata.com/card.png</url>
<title>CrunchyData Blog</title>
<link>https://www.crunchydata.com/blog/topic/postgres-18</link>
<width>800</width>
<height>419</height></image>
<description>PostgreSQL experts from Crunchy Data share advice, performance tips, and guides on successfully running PostgreSQL and Kubernetes solutions</description>
<language>en-us</language>
<pubDate>Thu, 11 Dec 2025 08:00:00 EST</pubDate>
<dc:date>2025-12-11T13:00:00.000Z</dc:date>
<dc:language>en-us</dc:language>
<sy:updatePeriod>hourly</sy:updatePeriod>
<sy:updateFrequency>1</sy:updateFrequency>
<item><title><![CDATA[ Postgres 18 New Default for Data Checksums and How to Deal with Upgrades ]]></title>
<link>https://www.crunchydata.com/blog/postgres-18-new-default-for-data-checksums-and-how-to-deal-with-upgrades</link>
<description><![CDATA[ Postgres 18 defaults to checksums on. This is a good feature for data integrity but might catch you off guard with an upgrade.  ]]></description>
<content:encoded><![CDATA[ <p>In a recent Postgres <a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=04bec894a04">patch</a> authored by Greg Sabino Mullane, Postgres has a new step forward for data integrity: <strong>data checksums are now enabled by default.</strong><p>This appears in the release notes as a fairly minor change but it significantly boosts the defense against one of the sneakiest problems in data management - silent data corruption.<p>Let’s dive into what this feature is, what the new default means for you, and how it impacts upgrades.<h2 id=what-is-a-data-checksum><a href=#what-is-a-data-checksum>What is a data checksum?</a></h2><p>A data checksum is a simple but powerful technique to verify the integrity of data pages stored on disk. It's like a digital fingerprint for every 8KB block of data (a "page") in your database.<ul><li><strong>Creation:</strong> When Postgres writes a data page (table and indexes) to disk, it runs an algorithm on the page's contents to calculate a derived, small value—the <strong>checksum</strong>.<li><strong>Storage:</strong> This checksum is stored in the page header alongside the data.<li><strong>Verification:</strong> Whenever Postgres reads that page back from disk, it immediately recalculates the checksum from the data and compares it to the stored value.</ul><p>If the two values do not match, it means the data page has been altered or corrupted since it was last written. This is important because data corruption can happen <em>silently.</em> By detecting a mismatch, Postgres can immediately raise an error and alert you to a potential problem. Checksums are also an integral part of <a href=https://github.com/pgbackrest/pgbackrest>pgBackRest</a> which uses these checksums to verify backups.<h2 id=what-is-initdb-and-why-does-it-matter><a href=#what-is-initdb-and-why-does-it-matter>What is initdb and why does it matter?</a></h2><p>The <code>initdb</code> command in Postgres is the utility used to create a new Postgres database cluster and initializes the data directory where Postgres stores all the permanent data. When you run initdb, it does things like:<ol><li>create the directory structure<li>create the template databases like <code>template1</code> and <code>postgres</code><li>populate the initial system catalog tables<li>create the initial version of the server configuration files<li>enable and start keeping track of checkums</ol><p>The syntax often looks something like this:<pre><code class=language-bash>/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
</code></pre><p>As an end user who uses cloud managed Postgres or even a local tool like Postgres.app, you generally never see the <code>initdb</code> command because it is a one-time administrative setup task.<h2 id=the-new-default---data-checksums-for-initdb><a href=#the-new-default---data-checksums-for-initdb>The new default <code>--data-checksums</code> for initdb</a></h2><p>In the past database admins had to manually add the <code>--data-checksums</code> flag when running initdb to enable this feature. If you forgot or didn’t know about this feature, the new cluster was created without these built-in integrity checks.<p>The default behavior of initdb is now to <strong>enable data checksums</strong> every time Postgres is initiated.<ul><li>old command - checksums OFF by default: <code>initdb -D /data/pg14</code><li>new default command - checksums ON by default: <code>initdb -D /data/pg18</code></ul><p>This is generally a win for Postgres best practices. Every new database cluster is now automatically equipped with this corruption defense, requiring no extra effort.<h3 id=--no-data-checksums><a href=#--no-data-checksums><code>--no-data-checksums</code></a></h3><p>You might have a very specific reason to disable checksums and you can explicitly opt out using the new flag:<pre><code class=language-sbash>initdb --no-data-checksums -D /data/pg18
</code></pre><h2 id=checksums-and-pg_upgrade><a href=#checksums-and-pg_upgrade>Checksums and <code>pg_upgrade</code></a></h2><p>While the new default is great, it may introduce a compatibility issue for those doing a major version upgrade using the <code>pg_upgrade</code> utility.<p>pg_upgrade works by connecting an old data directory to a new data directory and a fundamental requirement is that both clusters must have the same checksum setting—either both ON or both OFF.<p>If you are upgrading an older Postgres cluster that was created before this change, chances are it has checksums disabled and pg_upgrade will fail because the settings mismatch.<p>In an upgrade pinch, to upgrade a non-checksum-enabled cluster, you can use the new <code>--no-data-checksums</code> flag when initializing the new cluster to make the settings align.<h3 id=upgrading-an-existing-postgres-database-to-checksums><a href=#upgrading-an-existing-postgres-database-to-checksums>Upgrading an existing Postgres database to checksums</a></h3><p>Instead of continuing forever with no data checksums, the better long term solution is to add checksums to your database before the next upgrade. Sadly, there’s really no way to do this without some downtime and a restart. Adding checksums to an existing database can be a slow process with a large database. There’s a <a href=https://www.crunchydata.com/blog/fun-with-pg_checksums>pg_checksums utility</a> to help with this which is well documented.<p>We have helped a few folks with this issue. For larger no-downtime environments, you can add the checkums on a replica machine and then fail over to that.<h2 id=summary><a href=#summary>Summary</a></h2><p>Postgres checksums are a great feature - and will be the default in the future. If you haven’t used checksums in the past, you may want to start planning now for adding them, especially since a self managed major version upgrade will require a bit of extra thinking. ]]></content:encoded>
<category><![CDATA[ Postgres 18 ]]></category>
<author><![CDATA[ Greg.Sabino.Mullane@crunchydata.com (Greg Sabino Mullane) ]]></author>
<dc:creator><![CDATA[ Greg Sabino Mullane ]]></dc:creator>
<guid isPermalink="false">fa1787ed297110b99885d60008c312cb0ebd13f901f1167f84a7af3a4dcf9755</guid>
<pubDate>Thu, 11 Dec 2025 08:00:00 EST</pubDate>
<dc:date>2025-12-11T13:00:00.000Z</dc:date>
<atom:updated>2025-12-11T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres 18: OLD and NEW Rows in the RETURNING Clause ]]></title>
<link>https://www.crunchydata.com/blog/postgres-18-old-and-new-in-the-returning-clause</link>
<description><![CDATA[ Postgres 18 now lets you see both old and new data when you add the RETURNING clause to an UPDATE statement ]]></description>
<content:encoded><![CDATA[ <p>Postgres 18 <a href=https://www.postgresql.org/about/news/postgresql-18-released-3142/>was released today</a>. Well down page from headline features like async I/O and UUIDv7 support, we get this nice little improvement:<blockquote><p>This release adds the capability to access both the previous (OLD) and current (NEW) values in the RETURNING clause for INSERT, UPDATE, DELETE and MERGE commands.</blockquote><p>It's not a showstopper the way async I/O is, but it <em>is</em> one of those small features that's invaluable in the right situation.<p>A simple demonstration with <code>UPDATE</code> to get all old and new values:<pre><code class=language-sql>UPDATE fruit
SET quantity = 300
WHERE item = 'Apples'
RETURNING OLD.*, NEW.*;

 id |  item  | quantity | id |  item  | quantity
----+--------+----------+----+--------+----------
  5 | Apples |      200 |  5 | Apples |      300
(1 row)
</code></pre><h2 id=detecting-new-rows-with-old-on-upsert><a href=#detecting-new-rows-with-old-on-upsert>Detecting new rows with <code>OLD</code> on upsert</a></h2><p>Say we're doing an upsert and want to differentiate between whether a row sent back by <code>RETURNING</code> was one that was newly inserted or an existing row that was updated. This was possible before, but relied on an unintuitive check on <code>xmax = 0</code> (see the very last line below):<pre><code class=language-sql>INSERT INTO webhook (
    id,
    data
) VALUES (
    @id,
    @data
)
ON CONFLICT (id)
    DO UPDATE SET id = webhook.id -- force upsert to return a row
RETURNING webhook.*,
    (xmax = 0) AS is_new;
</code></pre><p>The statement relies on <code>xmax</code> being set to zero for a fresh insert as an artifact of Postgres' locking implementation (see a <a href=https://stackoverflow.com/a/39204667>full explanation for why this happens</a>). It works, but isn't a guaranteed part of the API, and could conceivably change at any time.<p>In Postgres 18, we can reimplement the above so it's more legible and doesn't rely on implementation details. It's easy too -- just check whether <code>OLD</code> is null in the returning clause:<pre><code class=language-sql>INSERT INTO webhook (
    id,
    data
) VALUES (
    @id,
    @data
)
ON CONFLICT (id)
    DO UPDATE SET id = webhook.id -- force upsert to return a row
RETURNING webhook.*,
    (OLD IS NULL)::boolean AS is_new;
</code></pre><p>Access to <code>OLD</code> and <code>NEW</code> will undoubtedly have many other useful cases, but this is one example that lets us improve pre-18 code right away. ]]></content:encoded>
<category><![CDATA[ Postgres 18 ]]></category>
<author><![CDATA[ Brandur.Leach@crunchydata.com (Brandur Leach) ]]></author>
<dc:creator><![CDATA[ Brandur Leach ]]></dc:creator>
<guid isPermalink="false">75c16f175890e6bc129bc2ce95db52f65a2524e67283b61425649a9d791a8270</guid>
<pubDate>Thu, 25 Sep 2025 11:00:00 EDT</pubDate>
<dc:date>2025-09-25T15:00:00.000Z</dc:date>
<atom:updated>2025-09-25T15:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Postgres’ Original Project Goals: The Creators Totally Nailed It ]]></title>
<link>https://www.crunchydata.com/blog/the-postgres-project-original-goals-and-how-the-creators-totally-nailed-it</link>
<description><![CDATA[ Dig in to the original goals of the Postgres academic project at UC Berkeley and how they shaped the Postgres we use today. ]]></description>
<content:encoded><![CDATA[ <p>I had a chance last week to sit down and read the <a href=https://dsf.berkeley.edu/papers/ERL-M85-95.pdf>original academic paper announcing Postgres</a> as a platform and the original design goals from 1986. I was just awestruck at the forethought - and how the original project goals laid the foundation for the database that seems to be taking over the world right now.<p>The PostgreSQL creators totally nailed it. They laid out a flexible framework for a variety of business use cases that would eventually become the most popular database 30 years later.<p>The paper outlines 6 project goals:<ol><li><p>better support for complex objects growing world of business and engineering use cases<li><p>provide user extendibility for data types, operators and access methods<li><p>provide facilities for active databases like alerters and triggers<li><p>simplify process for crash recovery<li><p>take advantage of upgraded hardware<li><p>utilize Codd’s relational model</ol><p>Let's look at all of them in reference to modern features of Postgres.<h2 id=1-objects-and-data-types-for-a-growing-world-of-business-and-engineering-use-cases><a href=#1-objects-and-data-types-for-a-growing-world-of-business-and-engineering-use-cases>1) Objects and data types for a growing world of business and engineering use cases</a></h2><p>Postgres has a rich and flexible set of native data types that are designed to meet a vast array of business use cases, from simple record-keeping to complex data analysis.<p>Numeric Types like <code>SMALLINT</code> and <code>INTEGER</code> are used for whole numbers while <code>BIGINT</code> might be for a user's unique ID or primary keys. Precision like <code>NUMERIC</code> and  <code>DECIMAL</code> are used, exact precision is critical, especially for <a href=https://www.crunchydata.com/blog/working-with-money-in-postgres>money in Postgres</a>. Floating-Point Types like <code>REAL</code> or <code>DOUBLE PRECISION</code> can be used for scientific or engineering calculations where absolute precision isn't as important as the range of values. You also have your <code>UUID</code> (<a href=https://www.crunchydata.com/blog/get-excited-about-postgres-18>indexable UUIDs</a> in Postgres 18) for distributed systems and secure URLs.<p>Character Types like <code>VARCHAR(n)</code> or <code>CHAR(n)</code> store variable-length text up to a specified maximum length (n) and only use as much storage as needed for the actual text.<p>Date/Time Types like <code>DATE</code> stores only the date (year, month, day).  <a href=https://www.crunchydata.com/blog/working-with-time-in-postgres><code>TIMESTAMPTZ</code></a> is the time and date GOAT with and is easily implemented into global systems.<p>But, wait, that’s not all, Postgres has within it, the ability to easily make <strong>custom data types</strong> and constrain data to the specifics of each use case.<p><a href=https://www.crunchydata.com/blog/intro-to-postgres-custom-data-types#using-create-domain>Using CREATE DOMAIN</a> you can create specific value check like confirming a range for birthday or email format validity.<pre><code class=language-sql>-- Postgres create domain
CREATE DOMAIN date_of_birth AS date
CHECK (value > '1930-01-01'::date);

CREATE DOMAIN valid_email AS text
NOT NULL
CHECK (value ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$');
</code></pre><p>Or using a direct <code>CREATE TYPE</code> you can make a new type as a composite. For example, new custom date type allowing for storage of height, width, and, weight in a single field.<pre><code class=language-sql>-- Postgres create type with composite
CREATE TYPE physical_package AS (
height numeric,
width numeric,
weight numeric);
</code></pre><p><a href=https://www.crunchydata.com/blog/enums-vs-check-constraints-in-postgres><code>Enums</code></a> let you create a custom type with a set of predefined values.<pre><code class=language-sql>-- Postgres enum
CREATE TYPE order_status AS ENUM (
'pending',
'shipped',
'cancelled');
</code></pre><p>Constraints take the enumerated type a bit further and let you specify rules and restrictions for data. Additionally adding a <code>CHECK</code> constraint to a list or even refer to other fields, like reserving a room with a start and end time.<pre><code class=language-sql>-- Postgres check contraint
ALTER TABLE public.reservations
ADD CONSTRAINT start_before_end
CHECK (start_time &#60 end_time);
</code></pre><p>While most applications will constrain data in its own way, Postgres’ strict and flexible typing allows both rigid validity and flexibility.<h2 id=2-extensibility-for-data-types-operators-and-access-methods><a href=#2-extensibility-for-data-types-operators-and-access-methods>2) Extensibility for data types, operators and access methods</a></h2><p>The authors knew that just data types wouldn’t be enough - the system would actually need to be extensible. In my estimation - this is actually the killer feature of Postgres. Sure, the database is solid  - but the ingenuity and enthusiasm of the extension ecosystem is incredibly special.<p>Let’s take PostGIS for example. This extension adds several key data types to the mix - the point, line, polygon, to store geospatial types. PostGIS also has hundreds of functions with it. There’s now an entire ecosystem of its own around this project that includes open-source mapping and fully open source web servers that rival paid GIS systems like ESRI.<p>The <code>pgvector extension</code> is another good example of Postgres extensibility too. Now <a href=https://www.crunchydata.com/blog/whats-postgres-got-to-do-with-ai>Postgres can store embedding data</a> right alongside application data. You can have LLMs create embeddings based on your data and you can query your data to find relatedness. You can also build your own <a href=https://www.crunchydata.com/blog/smarter-postgres-llm-with-retrieval-augmented-generation>Postgres RAG</a> system right inside your database<pre><code class=language-sql>-- find distance between two embedding values
recipe_1.embedding &#60=> recipe_2.embedding
</code></pre><p>Data types and extensions aren’t the only thing that came out of this idea though - the indexes themselves in Postgres are incredibly advanced. Generalized Inverted Index (GIN) and Generalized Search Tree (GiST) are themselves extensible indexing frameworks that support many of the complex data types mentioned above.<h2 id=3-features-for-active-databases-like-alerters-and-triggers><a href=#3-features-for-active-databases-like-alerters-and-triggers>3) Features for active databases like alerters and triggers</a></h2><p>Modern Postgres users have a suite of tools available to them to have the database do necessary work. The trigger system easily updates fields once another field changes.<pre><code class=language-sql>-- Postgres sample function to update fields
CREATE OR REPLACE FUNCTION update_inventory_on_sale()
RETURNS TRIGGER AS $$
BEGIN
UPDATE products
SET quantity_on_hand = quantity_on_hand - NEW.quantity_sold
WHERE id = NEW.product_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'No product found with ID %', NEW.product_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
</code></pre><p>For events outside the database, Postgres has a handy little <code>NOTIFY/LISTEN</code> mechanism for sending notifications to the outside so your application or dashboard will know when a new order was placed or a specific action happened. There’s an extension now to use the <a href=https://www.crunchydata.com/blog/real-time-database-events-with-pg_eventserv>listen notify system events as WebSockets</a>.<p>Postgres’ <a href=https://www.crunchydata.com/blog/data-to-go-postgres-logical-replication>logical replication</a> makes use of the ‘active database’ idea. PostgreSQL's logical replication is cool because it streams individual data changes rather than physical block-level copies, allowing you to replicate data between different major Postgres versions or even different platforms. This flexibility enables powerful use cases like creating specialized read replicas, consolidating multiple databases into a central one, and performing zero-downtime major version upgrades.<pre><code class=language-sql>-- Postgres create logical replication
CREATE PUBLICATION user_pub FOR TABLE user_id, forum_posts;
</code></pre><h2 id=4-simplify-process-for-crash-recovery><a href=#4-simplify-process-for-crash-recovery>4) Simplify process for crash recovery</a></h2><p>The original method of Postgres data recovery relied on writing all data modifications to the files on disk before each commit which was called "force-to-disk". Unfortunately this original implementation had major performance issues and a potential for corruption. The Write Ahead Log (WAL) which was released with version 7.1 changed this into a different system that first writes changes to a log file and then applies those changes to the main data files.<p>WAL is the foundation of all of Postgres’ amazing backup and disaster recovery story. WAL is used to create incremental backups, complete with the <a href=https://www.crunchydata.com/blog/database-terminology-explained-postgres-high-availability-and-disaster-recovery#disaster-recovery-is-about-more-than-just-availability>Point-in-Time disaster recovery</a> system that many rely on today.<p>WAL is also foundational to Postgres streaming replication, which makes high availability possible. A primary writes all database changes (inserts, updates, deletes) into its Write-Ahead Log and then "streams" these WAL records over the network to the standby (replica) nodes. The standby nodes receive these WAL records and apply them to their own copy of the database, keeping them in sync with the primary. In the event of an emergency automated failover, like <a href=https://github.com/patroni/patroni>Patroni</a>, can promote a new primary.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/bc74acb6-3405-43f1-cee3-153c8375be00/public><h2 id=5-take-advantage-of-upgraded-hardware><a href=#5-take-advantage-of-upgraded-hardware>5) Take advantage of upgraded hardware</a></h2><p>PostgreSQL was engineered for the hardware realities of its time: single-core CPUs, severely limited RAM often measured in megabytes, and slow, spinning hard drives. The primary design focus was on correctness and data durability over raw speed. PostgreSQL built its legendary reputation for stability and ACID compliance, ensuring that data remained safe even when running on less reliable hardware.<p>Fast forward to today, where PostgreSQL runs on hardware with dozens of CPU cores, terabytes of ultra-fast NVMe storage and vast amounts of RAM (we even have half a tb of RAM available now). PostgreSQL recently introduced <a href=https://www.crunchydata.com/blog/parallel-queries-in-postgres>parallel query execution</a> which breaks up complex queries and runs them simultaneously, gathering the results at the end. Modern PostgreSQL has also vastly improved its locking mechanisms, connection pooling solutions, and replication capabilities, evolving from a robust single-server database into a high-performance powerhouse that can scale horizontally and handle the massive, concurrent workloads of the modern internet.<p>While Postgres today does not yet have the modern CPU <a href=https://wiki.postgresql.org/wiki/Multithreading>multi-threading</a>, this is on the horizon, and Postgres 18 just added <a href=https://www.crunchydata.com/blog/get-excited-about-postgres-18>asynchronous i/o</a>.<h2 id=6-utilize-codds-relational-model><a href=#6-utilize-codds-relational-model>6) Utilize Codd’s relational model</a></h2><p>At the height of the NoSQL movement in the late 2000s and early 2010s, a common story was told that relational databases were a relic of the past. With the rise of big and unstructured data, this old model may soon be cast out.<p>Postgres continued to do what it always has done and embraced its core strength - flexibility of data typing – and adopted some of NoSQL’s own ideas. Postgres introduced the JSON data type and then later the binary, <a href=https://www.crunchydata.com/blog/indexing-jsonb-in-postgres>indexable JSONB</a> type. With this update, applications can now store schema-less API driven JSON data directly in a relational database and query it efficiently using a rich set of operators and functions. With features like <a href=https://www.crunchydata.com/blog/easily-convert-json-into-columns-and-rows-with-json_table><code>json_table</code></a>, you can go between arrays or traditional tables.<p>The newest revolution in the Postgres world seems to be the adoption of technologies to tie Postgres directly to unstructured flat files. Projects like pg_duckdb, pg_mooncake, and <a href=https://www.crunchydata.com/products/warehouse>Crunchy Data Warehouse</a> use custom extensions to work directly with files in csv, Parquet, and Iceberg directly in the data lake remote object stores where they reside. Even with the data abstracted to another location, Postgres’ relational model is still relevant, efficient, and trusted.<h2 id=summary><a href=#summary>Summary</a></h2><p>With Postgres’ flexibility - you can have a fully normalized, relational schema with foreign keys and JOINs, while also having an indexed JSONB document and full spatial geometry. We’re at a point in history where AI, science, and research are backed by a database that had no idea what the world would be like when it was built. Postgres is still here.<p>These original goals have had a profound impact on the project. Allowing for complexity and flexibility in a growing business landscape, while being easy to alter for individual use cases. And being ready for hardware (and cloud) technology that makes Postgres’ distribution even easier. ]]></content:encoded>
<category><![CDATA[ Postgres 18 ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">1e8a25ee9384198a8eb616a19c72c4d852e451563c2209f890ca1e7d8545a4ac</guid>
<pubDate>Tue, 23 Sep 2025 09:00:00 EDT</pubDate>
<dc:date>2025-09-23T13:00:00.000Z</dc:date>
<atom:updated>2025-09-23T13:00:00.000Z</atom:updated></item>
<item><title><![CDATA[ Get Excited About Postgres 18 ]]></title>
<link>https://www.crunchydata.com/blog/get-excited-about-postgres-18</link>
<description><![CDATA[ New to Postgres 18, features like asynchronous i/o, uuid v7, b-tree skip scans, and virtual generated columns. ]]></description>
<content:encoded><![CDATA[ <p>Postgres 18 will be released in just a couple weeks! Here’s some details on the most important and exciting features.<h2 id=asynchronous-io><a href=#asynchronous-io>Asynchronous i/o</a></h2><p>Postgres 18 is adding asynchronous i/o. This means faster reads for many use cases. This is also part of a bigger series of performance improvements planned for future Postgres, part of which may be multi-threading. Expect to see more on this in coming versions.<p><strong>What is async I/O?</strong><p>When <a href=https://www.crunchydata.com/blog/postgres-data-flow>data</a> isn’t in the shared memory buffers already, Postgres reads from disk, and <a href=https://www.crunchydata.com/blog/understanding-postgres-iops>I/O is needed to retrieve data</a>. Synchronous I/O means that each individual request to the disk is waited on for completion before moving on to something else. For busy databases with a lot of activity, this can be a bottleneck.<p>Postgres 18 will introduce asynchronous I/O, allowing workers to optimize idle time and improve system throughput by batching reads. Currently, Postgres relies on the operating system for intelligent I/O handling, expecting OS or storage read-ahead for sequential scans and using features like Linux's posix_fadvise for other read types like Bitmap Index Scans. Moving this work into the database with asynchronous I/O will provide a more predictable and better-performing method for batching operations at the database level. Additionally, a new system view, pg_aios, will be available to provide data about the asynchronous I/O system.<p>Postgres writes will continue to be synchronous - since this is needed for ACID compliance.<p>If async i/o seems confusing, think of it like ordering food at a restaurant. In a synchronous model, you would place your order and stand at the counter, waiting, until your food is ready before you can do anything else. In an asynchronous model, you place your order, receive a buzzer, and are free to go back to your table and chat with friends until the buzzer goes off, signaling that your food is ready to be picked up.<p>Async I/O will affect:<ul><li>sequential scans<li>bitmap heap scans (following the bitmap index scan)<li>some maintenance operations like VACUUM.</ul><p>By default Postgres will turn on <strong>io_method = worker</strong>. By default there are 3 workers and this can be adjusted up for systems with larger CPU workers. I haven’t seen any reliable recommendations on this, so stay tuned for more on that from our team soon.<p>For Postgres running on Linux 5.1+ you can utilize the io_uring system calls and have the invocations made via the actual backends rather than having separate processes with the optional <strong>io_method = io_uring</strong>.<h2 id=uuid-v7><a href=#uuid-v7>UUID v7</a></h2><p>UUIDs are getting a bit of an overhaul in this version by moving to v7.<p>UUIDs are randomly generated strings which are globally unique and often used for primary keys. UUIDs are popular in modern applications for a couple reasons:<ul><li>They’re unique: You can use keys generated from more than one place.<li>Decoupled:Your application can generate a primary key <em>before</em> sending the data to the database.<li>URL obscurity: If your URLs use primary keys (e.g., .../users/5), other URLs are easy to guess (.../users/6, .../users/7). With a UUID (.../users/f47ac10b-58cc-4372-a567-0e02b2c3d479), it's impossible to guess other IDs.</ul><p>A new standard for UUID v7 came out in mid-2024 via a series of standards updates. UUIDv4 was the prior version of uuid with native Postgres support. But sorting and indexing in large tables had performance issues due to the relative randomness, leading to fragmented indexes and bad locality.  UUIDv7 helps with the sort and indexing issues. It is still random but that first 48 bits (12 characters) are a timestamp, and the remaining bits are random; this gives better locality for data inserted around the same time and thus better indexability.<p>The timestamp part is a hexadecimal value (i.e. compressed decimal). So for example a uuid that begins with <code>01896d6e4a5d6</code> (hex) would represent the <code>2707238289622</code> (decimal) and that is the number of milliseconds since 1970.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/2bf43dd0-9a3a-4535-55c0-5f18a9a9a200/public><p>This is how the DDL will look for uuid v7:<pre><code class=language-sql>CREATE TABLE user_actions (
action_id UUID PRIMARY KEY DEFAULT uuidv7(),
user_id BIGINT NOT NULL,
action_description TEXT,
action_time TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_action_id ON user_actions (action_id);
</code></pre><h2 id=b-tree-skip-scans><a href=#b-tree-skip-scans>B-tree skip scans</a></h2><p>There’s a nice performance bump coming in Postgres 18 for some multi-column B-tree indexes.<p>In Postgres, if you have an index on columns (<code>status</code>, <code>date</code>) in a table, this index can be used to match queries which query both <code>status</code> and <code>date</code> fields, or just <code>status</code>.<p>In Postgres 17 and below, this same index cannot be used to answer queries against just the <code>date</code> field; you would have to have that column indexed separately or the database would resort to a sequence scan + filter approach if there were no appropriate indexes for that table.<p>In Postgres 18, in many cases it can automatically use this multi-column index for queries touching only the <code>date</code> field.  Known as a skip scan, this lets the system "skip" over portions of the index.<p>This works when queries don’t use the leading columns in the conditions and the omitted column has a low cardinality, like a small number of distinct values. The optimization works by:<ol><li>Identifying all the distinct values in the omitted leading column(s).<li>Effectively transform the query to add the conditions to match the leading values.<li>The resulting query is able to use existing infrastructure to optimize lookups across multiple leading columns, effectively skipping any pages in the index scan which do not match both conditions.</ol><p>For example, if we had a sales table with columns <code>status</code> and <code>date</code>, we might have a multi-column index:<pre><code class=language-sql>CREATE INDEX idx_status_date
ON sales (status, date);
</code></pre><p>An example query could have a where clause that doesn’t include status.<pre><code class=language-sql>SELECT * FROM sales
WHERE date = '2025-01-01';
</code></pre><p>Nothing in the query plan tells you this is a skip scan, so you’ll end up with a normal Index scan like this, showing you the index conditions.<pre><code class=language-sql>                                QUERY PLAN
-------------------------------------------------------------
 Index Only Scan using idx_status_date on sales  (cost=0.29..21.54 rows=4 width=8)
   Index Cond: (date = '2025-01-01'::date)
(2 rows)
</code></pre><p>Before 18, a full table scan would be done, since the leading column of the index is not included, but with skip scan Postgres can use the same index for this index scan.<p>In Postgres 18, because status has a low cardinality and just a few values, a compound index scan can be done. Note that this optimization only works for queries which use the <code>=</code> operator, so it will not work with inequalities or ranges.<p>This all happens behind-the-scenes in the Postgres planner so you don’t need to turn it on. The idea is that it will benefit analytics use cases where filters and conditions often change and aren’t necessarily related to existing indexes.<p>The query planner will decide if using a skip scan is worthwhile, based on the table's statistics and the number of distinct values in the columns being skipped.<p><img alt loading=lazy src=https://imagedelivery.net/lPM0ntuwQfh8VQgJRu0mFg/6d5ed16d-2a24-4ff4-4a6c-fd42773e4b00/public><h2 id=generated-columns-on-the-fly><a href=#generated-columns-on-the-fly>Generated columns on-the-fly</a></h2><p>PostgreSQL 18 introduces virtual generated columns. Previously, generated columns were always stored on disk. This meant for generated columns, values were computed at the time of an insert or update and adding a bit of write overhead.<p>In PostgreSQL 18, virtual generated columns are now the default type for generated columns. if you define a generated column without explicitly specifying STORED, it will be created as a virtual generated column.<pre><code class=language-sql>CREATE TABLE user_profiles (
user_id SERIAL PRIMARY KEY,
settings JSONB,
username VARCHAR(100) GENERATED ALWAYS AS (settings ->> 'username') VIRTUAL
);
</code></pre><p>This is a great update for folks using JSON data, queries can be simplified and data changes or normalization can be done on the fly as needed.<p>Note that virtual generated columns are not indexable - since they’re not stored on disk. For <a href=https://www.crunchydata.com/blog/indexing-jsonb-in-postgres>indexing of JSONB</a>, use the stored version or expression index.<h2 id=oauth-20><a href=#oauth-20>OAUTH 2.0</a></h2><p>Good news for folks that use Okta, Keycloak, and other managed authentication services, Postgres is now compatible with OAUTH 2.0. This is specified in the main host based authentication configuration (pg_hba.conf) file.<p>The Oauth system uses bearer tokens where the client application presents a token instead of a password to prove identity. The token is an opaque string and its format is determined by the authorization server. This feature removes the need to store passwords in the database. It also allows for more robust security measures like multi-factor authentication (MFA) and single sign-on (SSO) to be managed by external identity providers.<h2 id=postgres-versions-are-packed-with-other-improvements><a href=#postgres-versions-are-packed-with-other-improvements>Postgres versions are packed with other improvements</a></h2><p>Postgres 18 comes with a staggering 3,000 commits from more than 200 authors. While many of these are features, there are numerous additions and optimizations under the hood to the Postgres query planner and other parts of the system that are behind the scenes. Even if you don’t utilize optional features, there’s still performance benefits (uh ... asyc i/o is a biggie), bug fixes, and security patches that make upgrading on a regular cadence a good idea. ]]></content:encoded>
<category><![CDATA[ Postgres 18 ]]></category>
<author><![CDATA[ Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) ]]></author>
<dc:creator><![CDATA[ Elizabeth Christensen ]]></dc:creator>
<guid isPermalink="false">0fe99b43c2417b308d641253451cc38618f70b171a295266a2dd8108b823f133</guid>
<pubDate>Fri, 12 Sep 2025 08:00:00 EDT</pubDate>
<dc:date>2025-09-12T12:00:00.000Z</dc:date>
<atom:updated>2025-09-12T12:00:00.000Z</atom:updated></item></channel></rss>