If you’ve ever dug under the hood of Postgres a bit, you’ve probably heard about the page. This is the on-disk storage mechanism and it's limited to an 8kb size. But what happens when you have data bigger than that 8kb? TOAST is made. Postgres TOASTs data by splitting it up into smaller chunks. TOAST stands for The Oversized Attribute Storage Technique.
TOAST happens automatically, you don’t set up anything, it just comes with Postgres out of the box. So why should you care?
Well TOAST can impact your query performance quite a bit. It adds some extra hurdles for Postgres to jump through when delivering data to your end users. So knowing how TOAST is made, when, and in some cases how to avoid it, is important for optimizing Postgres.
So let’s dig in a bit more into the Postgres page size and why things need to be toasted.
Postgres stores data in 8 kilobyte pages. If you’ve heard of huge pages, this is a different page size at the Linux kernel level, not the Postgres disk level. The postgres page size is not currently configurable.
Postgres wants to fill up the 8kb with tuples and this could be a couple big rows(tuples) or thousands of them, depending on the data inside. In general, Postgres prefers the biggest tuple that will fit in a page is 2kb. Anything larger than 2kb is up for consideration to be TOASTed.
So if you have a really big row, with lots of data or text in one of the columns, like this:
Postgres will add the data to several pages in segments and create a toast table that references these pieces. TOAST segments are 2KB too so you have about 4 per 8kb page in the toast table.
All of this happens with you knowing and when you query the data, you won’t know if the data comes through like that. Actually EXPLAIN plans won’t even show you this (hey, if anyone’s looking for a patch idea, this would be a nice feature). The TOAST table is automatically indexed, so the lookup is very fast when the data needs to be compiled.
If you’re wondering why Postgres does not compress the data before TOASTing it,
you’re on the right track and in fact Postgres does do this. Compression and
TOAST are pretty tightly coupled together. Compression settings and TOAST are
part of the underlying data type attributes (
attstorage). Data columns have 4
possible options for storage type and these are preset with the data type
|Attribute Storage Name
|Abbreviation in attstorage
|Kinds of data
|INTEGER data types, BOOLEAN, TIMESTAMPTZ, and other things that are small.
|This is going to be the default for TEXT data types, JSON/JSONB, Large Objects, and Binary data types.
|This is used for rollup tables in hyperloglog, and in custom data types where you’ve already compressed the data, for example, a custom image datatype.
|Used by NUMERIC.
If you want to see all your columns, storage types, and data types, here’s a query for that:
JOIN pg_attribute ON (
columns.table_name = pg_attribute.attrelid::regclass::text
AND columns.column_name = pg_attribute.attname
Postgres creates a TOAST table if you have text/varchar/jsonb/bytea in preparation for you later adding bigger data. So if you’re poking around, you’ll see a TOAST table for all of our text tables and it might have 0 pages if it has never been used yet.
Postgres is great at exposing lots of data, especially in the pg_class table, to fill you in on the TOAST information. Here’s a query to find out what source tables have TOAST tables with data and how big the TOAST tables are.
c.relname AS source_table_name,
c.relpages AS source_table_number_of_pages,
c.reltuples AS source_table_number_of_tuples,
c.reltoastrelid AS toast_table_oid,
t.relname AS toast_table_name,
t.relpages AS toast_table_number_of_pages,
t.reltuples AS toast_table_number_of_tuples
JOIN pg_class t ON c.reltoastrelid = t.oid
t.relpages > 0;
So TOAST is great and it means you can store a lot of big data in Postgres, including JSON, images, large objects, and more. However, there is a performance tradeoff and a few things to keep in mind.
So because of Postgres’ lovely MVCC framework that makes it incredibly accurate, anytime you update a single column of a TOASTed row, you have to update all the pages affected. If you’ve dealt with vacuum issues before, you’re already aware of how write amplification creates dead tuples that need to be cleaned up later. Where this really becomes an issue is if you’re using JSON and frequently updating one piece of the array. That’s probably going to be a really inefficient way to store and retrieve that data long term.
Another performance note to make here is that you should avoid compressing if you’re accessing something often. An example of this would be text needed for a full text search. If you compress that, full searches will be terribly inefficient.
While TOAST is a nice feature, it's not something you want happening to your frequently accessed and queried data. If you’re using large text blobs or JSON and can instead represent your data with structure in traditional columns, do that.
For spatial data, larger sets like polygons and longer string data will need to be toasted. This can make those queries slow and inefficient. You might want to break up larger sets of data in your storage sets by using functions like ST_Subdivide().
- Postgres TOASTs data if a row is larger than the 8kb page size. Toasting breaks up the data into smaller parts across multiple pages.
- Toasted data can also be compressed.
- Toasted data generally is slower to respond to queries.
- Postgres will toast data for you automatically and you don't need to do anything for occasionally toasted data.
- If you’re storing a lot of images, JSON, or data that’s often larger than 8kb per row, pay attention to your data type settings around toast and compression.
- Look at what you are TOASTing and if its data you frequently query, consider a data modeling change to break that data up into a way that will be more efficient for the database.
Credits: Thanks to Greg Sabino Mullane for his great TOAST presentation that gave me the idea for this summary post, and for the adorable hippo image.
January 2, 2024 •More by this author