CrunchyData Blog

Postgres 18: OLD and NEW Rows in the RETURNING Clause

Brandur.Leach@crunchydata.com (Brandur Leach) — Thu, 25 Sep 2025 11:00:00 EDT

Postgres 18 was released today. Well down page from headline features like async I/O and UUIDv7 support, we get this nice little improvement:

This release adds the capability to access both the previous (OLD) and current (NEW) values in the RETURNING clause for INSERT, UPDATE, DELETE and MERGE commands.

It's not a showstopper the way async I/O is, but it is one of those small features that's invaluable in the right situation.

A simple demonstration with UPDATE to get all old and new values:

UPDATE fruit
SET quantity = 300
WHERE item = 'Apples'
RETURNING OLD.*, NEW.*;

 id |  item  | quantity | id |  item  | quantity
----+--------+----------+----+--------+----------
  5 | Apples |      200 |  5 | Apples |      300
(1 row)

Detecting new rows with `OLD` on upsert

Say we're doing an upsert and want to differentiate between whether a row sent back by RETURNING was one that was newly inserted or an existing row that was updated. This was possible before, but relied on an unintuitive check on xmax = 0 (see the very last line below):

INSERT INTO webhook (
    id,
    data
) VALUES (
    @id,
    @data
)
ON CONFLICT (id)
    DO UPDATE SET id = webhook.id -- force upsert to return a row
RETURNING webhook.*,
    (xmax = 0) AS is_new;

The statement relies on xmax being set to zero for a fresh insert as an artifact of Postgres' locking implementation (see a full explanation for why this happens). It works, but isn't a guaranteed part of the API, and could conceivably change at any time.

In Postgres 18, we can reimplement the above so it's more legible and doesn't rely on implementation details. It's easy too -- just check whether OLD is null in the returning clause:

INSERT INTO webhook (
    id,
    data
) VALUES (
    @id,
    @data
)
ON CONFLICT (id)
    DO UPDATE SET id = webhook.id -- force upsert to return a row
RETURNING webhook.*,
    (OLD IS NULL)::boolean AS is_new;

Access to OLD and NEW will undoubtedly have many other useful cases, but this is one example that lets us improve pre-18 code right away.

Postgres’ Original Project Goals: The Creators Totally Nailed It

Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) — Tue, 23 Sep 2025 09:00:00 EDT

I had a chance last week to sit down and read the original academic paper announcing Postgres as a platform and the original design goals from 1986. I was just awestruck at the forethought - and how the original project goals laid the foundation for the database that seems to be taking over the world right now.

The PostgreSQL creators totally nailed it. They laid out a flexible framework for a variety of business use cases that would eventually become the most popular database 30 years later.

The paper outlines 6 project goals:

better support for complex objects growing world of business and engineering use cases
provide user extendibility for data types, operators and access methods
provide facilities for active databases like alerters and triggers
simplify process for crash recovery
take advantage of upgraded hardware
utilize Codd’s relational model

Let's look at all of them in reference to modern features of Postgres.

1) Objects and data types for a growing world of business and engineering use cases

Postgres has a rich and flexible set of native data types that are designed to meet a vast array of business use cases, from simple record-keeping to complex data analysis.

Numeric Types like SMALLINT and INTEGER are used for whole numbers while BIGINT might be for a user's unique ID or primary keys. Precision like NUMERIC and DECIMAL are used, exact precision is critical, especially for money in Postgres. Floating-Point Types like REAL or DOUBLE PRECISION can be used for scientific or engineering calculations where absolute precision isn't as important as the range of values. You also have your UUID (indexable UUIDs in Postgres 18) for distributed systems and secure URLs.

Character Types like VARCHAR(n) or CHAR(n) store variable-length text up to a specified maximum length (n) and only use as much storage as needed for the actual text.

Date/Time Types like DATE stores only the date (year, month, day). TIMESTAMPTZ is the time and date GOAT with and is easily implemented into global systems.

But, wait, that’s not all, Postgres has within it, the ability to easily make custom data types and constrain data to the specifics of each use case.

Using CREATE DOMAIN you can create specific value check like confirming a range for birthday or email format validity.

-- Postgres create domain
CREATE DOMAIN date_of_birth AS date
CHECK (value > '1930-01-01'::date);

CREATE DOMAIN valid_email AS text
NOT NULL
CHECK (value ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$');

Or using a direct CREATE TYPE you can make a new type as a composite. For example, new custom date type allowing for storage of height, width, and, weight in a single field.

-- Postgres create type with composite
CREATE TYPE physical_package AS (
height numeric,
width numeric,
weight numeric);

Enums let you create a custom type with a set of predefined values.

-- Postgres enum
CREATE TYPE order_status AS ENUM (
'pending',
'shipped',
'cancelled');

Constraints take the enumerated type a bit further and let you specify rules and restrictions for data. Additionally adding a CHECK constraint to a list or even refer to other fields, like reserving a room with a start and end time.

-- Postgres check contraint
ALTER TABLE public.reservations
ADD CONSTRAINT start_before_end
CHECK (start_time < end_time);

While most applications will constrain data in its own way, Postgres’ strict and flexible typing allows both rigid validity and flexibility.

2) Extensibility for data types, operators and access methods

The authors knew that just data types wouldn’t be enough - the system would actually need to be extensible. In my estimation - this is actually the killer feature of Postgres. Sure, the database is solid - but the ingenuity and enthusiasm of the extension ecosystem is incredibly special.

Let’s take PostGIS for example. This extension adds several key data types to the mix - the point, line, polygon, to store geospatial types. PostGIS also has hundreds of functions with it. There’s now an entire ecosystem of its own around this project that includes open-source mapping and fully open source web servers that rival paid GIS systems like ESRI.

The pgvector extension is another good example of Postgres extensibility too. Now Postgres can store embedding data right alongside application data. You can have LLMs create embeddings based on your data and you can query your data to find relatedness. You can also build your own Postgres RAG system right inside your database

-- find distance between two embedding values
recipe_1.embedding <=> recipe_2.embedding

Data types and extensions aren’t the only thing that came out of this idea though - the indexes themselves in Postgres are incredibly advanced. Generalized Inverted Index (GIN) and Generalized Search Tree (GiST) are themselves extensible indexing frameworks that support many of the complex data types mentioned above.

3) Features for active databases like alerters and triggers

Modern Postgres users have a suite of tools available to them to have the database do necessary work. The trigger system easily updates fields once another field changes.

-- Postgres sample function to update fields
CREATE OR REPLACE FUNCTION update_inventory_on_sale()
RETURNS TRIGGER AS $$
BEGIN
UPDATE products
SET quantity_on_hand = quantity_on_hand - NEW.quantity_sold
WHERE id = NEW.product_id;
IF NOT FOUND THEN
RAISE EXCEPTION 'No product found with ID %', NEW.product_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

For events outside the database, Postgres has a handy little NOTIFY/LISTEN mechanism for sending notifications to the outside so your application or dashboard will know when a new order was placed or a specific action happened. There’s an extension now to use the listen notify system events as WebSockets.

Postgres’ logical replication makes use of the ‘active database’ idea. PostgreSQL's logical replication is cool because it streams individual data changes rather than physical block-level copies, allowing you to replicate data between different major Postgres versions or even different platforms. This flexibility enables powerful use cases like creating specialized read replicas, consolidating multiple databases into a central one, and performing zero-downtime major version upgrades.

-- Postgres create logical replication
CREATE PUBLICATION user_pub FOR TABLE user_id, forum_posts;

4) Simplify process for crash recovery

The original method of Postgres data recovery relied on writing all data modifications to the files on disk before each commit which was called "force-to-disk". Unfortunately this original implementation had major performance issues and a potential for corruption. The Write Ahead Log (WAL) which was released with version 7.1 changed this into a different system that first writes changes to a log file and then applies those changes to the main data files.

WAL is the foundation of all of Postgres’ amazing backup and disaster recovery story. WAL is used to create incremental backups, complete with the Point-in-Time disaster recovery system that many rely on today.

WAL is also foundational to Postgres streaming replication, which makes high availability possible. A primary writes all database changes (inserts, updates, deletes) into its Write-Ahead Log and then "streams" these WAL records over the network to the standby (replica) nodes. The standby nodes receive these WAL records and apply them to their own copy of the database, keeping them in sync with the primary. In the event of an emergency automated failover, like Patroni, can promote a new primary.

5) Take advantage of upgraded hardware

PostgreSQL was engineered for the hardware realities of its time: single-core CPUs, severely limited RAM often measured in megabytes, and slow, spinning hard drives. The primary design focus was on correctness and data durability over raw speed. PostgreSQL built its legendary reputation for stability and ACID compliance, ensuring that data remained safe even when running on less reliable hardware.

Fast forward to today, where PostgreSQL runs on hardware with dozens of CPU cores, terabytes of ultra-fast NVMe storage and vast amounts of RAM (we even have half a tb of RAM available now). PostgreSQL recently introduced parallel query execution which breaks up complex queries and runs them simultaneously, gathering the results at the end. Modern PostgreSQL has also vastly improved its locking mechanisms, connection pooling solutions, and replication capabilities, evolving from a robust single-server database into a high-performance powerhouse that can scale horizontally and handle the massive, concurrent workloads of the modern internet.

While Postgres today does not yet have the modern CPU multi-threading, this is on the horizon, and Postgres 18 just added asynchronous i/o.

6) Utilize Codd’s relational model

At the height of the NoSQL movement in the late 2000s and early 2010s, a common story was told that relational databases were a relic of the past. With the rise of big and unstructured data, this old model may soon be cast out.

Postgres continued to do what it always has done and embraced its core strength - flexibility of data typing – and adopted some of NoSQL’s own ideas. Postgres introduced the JSON data type and then later the binary, indexable JSONB type. With this update, applications can now store schema-less API driven JSON data directly in a relational database and query it efficiently using a rich set of operators and functions. With features like json_table, you can go between arrays or traditional tables.

The newest revolution in the Postgres world seems to be the adoption of technologies to tie Postgres directly to unstructured flat files. Projects like pg_duckdb, pg_mooncake, and Crunchy Data Warehouse use custom extensions to work directly with files in csv, Parquet, and Iceberg directly in the data lake remote object stores where they reside. Even with the data abstracted to another location, Postgres’ relational model is still relevant, efficient, and trusted.

Summary

With Postgres’ flexibility - you can have a fully normalized, relational schema with foreign keys and JOINs, while also having an indexed JSONB document and full spatial geometry. We’re at a point in history where AI, science, and research are backed by a database that had no idea what the world would be like when it was built. Postgres is still here.

These original goals have had a profound impact on the project. Allowing for complexity and flexibility in a growing business landscape, while being easy to alter for individual use cases. And being ready for hardware (and cloud) technology that makes Postgres’ distribution even easier.

Get Excited About Postgres 18

Elizabeth.Christensen@crunchydata.com (Elizabeth Christensen) — Fri, 12 Sep 2025 08:00:00 EDT

Postgres 18 will be released in just a couple weeks! Here’s some details on the most important and exciting features.

Asynchronous i/o

Postgres 18 is adding asynchronous i/o. This means faster reads for many use cases. This is also part of a bigger series of performance improvements planned for future Postgres, part of which may be multi-threading. Expect to see more on this in coming versions.

What is async I/O?

When data isn’t in the shared memory buffers already, Postgres reads from disk, and I/O is needed to retrieve data. Synchronous I/O means that each individual request to the disk is waited on for completion before moving on to something else. For busy databases with a lot of activity, this can be a bottleneck.

Postgres 18 will introduce asynchronous I/O, allowing workers to optimize idle time and improve system throughput by batching reads. Currently, Postgres relies on the operating system for intelligent I/O handling, expecting OS or storage read-ahead for sequential scans and using features like Linux's posix_fadvise for other read types like Bitmap Index Scans. Moving this work into the database with asynchronous I/O will provide a more predictable and better-performing method for batching operations at the database level. Additionally, a new system view, pg_aios, will be available to provide data about the asynchronous I/O system.

Postgres writes will continue to be synchronous - since this is needed for ACID compliance.

If async i/o seems confusing, think of it like ordering food at a restaurant. In a synchronous model, you would place your order and stand at the counter, waiting, until your food is ready before you can do anything else. In an asynchronous model, you place your order, receive a buzzer, and are free to go back to your table and chat with friends until the buzzer goes off, signaling that your food is ready to be picked up.

Async I/O will affect:

sequential scans
bitmap heap scans (following the bitmap index scan)
some maintenance operations like VACUUM.

By default Postgres will turn on io_method = worker. By default there are 3 workers and this can be adjusted up for systems with larger CPU workers. I haven’t seen any reliable recommendations on this, so stay tuned for more on that from our team soon.

For Postgres running on Linux 5.1+ you can utilize the io_uring system calls and have the invocations made via the actual backends rather than having separate processes with the optional io_method = io_uring.

UUID v7

UUIDs are getting a bit of an overhaul in this version by moving to v7.

UUIDs are randomly generated strings which are globally unique and often used for primary keys. UUIDs are popular in modern applications for a couple reasons:

They’re unique: You can use keys generated from more than one place.
Decoupled:Your application can generate a primary key before sending the data to the database.
URL obscurity: If your URLs use primary keys (e.g., .../users/5), other URLs are easy to guess (.../users/6, .../users/7). With a UUID (.../users/f47ac10b-58cc-4372-a567-0e02b2c3d479), it's impossible to guess other IDs.

A new standard for UUID v7 came out in mid-2024 via a series of standards updates. UUIDv4 was the prior version of uuid with native Postgres support. But sorting and indexing in large tables had performance issues due to the relative randomness, leading to fragmented indexes and bad locality. UUIDv7 helps with the sort and indexing issues. It is still random but that first 48 bits (12 characters) are a timestamp, and the remaining bits are random; this gives better locality for data inserted around the same time and thus better indexability.

The timestamp part is a hexadecimal value (i.e. compressed decimal). So for example a uuid that begins with 01896d6e4a5d6 (hex) would represent the 2707238289622 (decimal) and that is the number of milliseconds since 1970.

This is how the DDL will look for uuid v7:

CREATE TABLE user_actions (
action_id UUID PRIMARY KEY DEFAULT uuidv7(),
user_id BIGINT NOT NULL,
action_description TEXT,
action_time TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_action_id ON user_actions (action_id);

B-tree skip scans

There’s a nice performance bump coming in Postgres 18 for some multi-column B-tree indexes.

In Postgres, if you have an index on columns (status, date) in a table, this index can be used to match queries which query both status and date fields, or just status.

In Postgres 17 and below, this same index cannot be used to answer queries against just the date field; you would have to have that column indexed separately or the database would resort to a sequence scan + filter approach if there were no appropriate indexes for that table.

In Postgres 18, in many cases it can automatically use this multi-column index for queries touching only the date field. Known as a skip scan, this lets the system "skip" over portions of the index.

This works when queries don’t use the leading columns in the conditions and the omitted column has a low cardinality, like a small number of distinct values. The optimization works by:

Identifying all the distinct values in the omitted leading column(s).
Effectively transform the query to add the conditions to match the leading values.
The resulting query is able to use existing infrastructure to optimize lookups across multiple leading columns, effectively skipping any pages in the index scan which do not match both conditions.

For example, if we had a sales table with columns status and date, we might have a multi-column index:

CREATE INDEX idx_status_date
ON sales (status, date);

An example query could have a where clause that doesn’t include status.

SELECT * FROM sales
WHERE date = '2025-01-01';

Nothing in the query plan tells you this is a skip scan, so you’ll end up with a normal Index scan like this, showing you the index conditions.

                                QUERY PLAN
-------------------------------------------------------------
 Index Only Scan using idx_status_date on sales  (cost=0.29..21.54 rows=4 width=8)
   Index Cond: (date = '2025-01-01'::date)
(2 rows)

Before 18, a full table scan would be done, since the leading column of the index is not included, but with skip scan Postgres can use the same index for this index scan.

In Postgres 18, because status has a low cardinality and just a few values, a compound index scan can be done. Note that this optimization only works for queries which use the = operator, so it will not work with inequalities or ranges.

This all happens behind-the-scenes in the Postgres planner so you don’t need to turn it on. The idea is that it will benefit analytics use cases where filters and conditions often change and aren’t necessarily related to existing indexes.

The query planner will decide if using a skip scan is worthwhile, based on the table's statistics and the number of distinct values in the columns being skipped.

Generated columns on-the-fly

PostgreSQL 18 introduces virtual generated columns. Previously, generated columns were always stored on disk. This meant for generated columns, values were computed at the time of an insert or update and adding a bit of write overhead.

In PostgreSQL 18, virtual generated columns are now the default type for generated columns. if you define a generated column without explicitly specifying STORED, it will be created as a virtual generated column.

CREATE TABLE user_profiles (
user_id SERIAL PRIMARY KEY,
settings JSONB,
username VARCHAR(100) GENERATED ALWAYS AS (settings ->> 'username') VIRTUAL
);

This is a great update for folks using JSON data, queries can be simplified and data changes or normalization can be done on the fly as needed.

Note that virtual generated columns are not indexable - since they’re not stored on disk. For indexing of JSONB, use the stored version or expression index.

OAUTH 2.0

Good news for folks that use Okta, Keycloak, and other managed authentication services, Postgres is now compatible with OAUTH 2.0. This is specified in the main host based authentication configuration (pg_hba.conf) file.

The Oauth system uses bearer tokens where the client application presents a token instead of a password to prove identity. The token is an opaque string and its format is determined by the authorization server. This feature removes the need to store passwords in the database. It also allows for more robust security measures like multi-factor authentication (MFA) and single sign-on (SSO) to be managed by external identity providers.

Postgres versions are packed with other improvements

Postgres 18 comes with a staggering 3,000 commits from more than 200 authors. While many of these are features, there are numerous additions and optimizations under the hood to the Postgres query planner and other parts of the system that are behind the scenes. Even if you don’t utilize optional features, there’s still performance benefits (uh ... asyc i/o is a biggie), bug fixes, and security patches that make upgrading on a regular cadence a good idea.