Is Postgres Read Heavy or Write Heavy? (And Why You Should You Care)
10 min readMore by this author
When someone asks about Postgres tuning, I always say “it depends”. What “it” is can vary widely but one major factor is the read and write traffic of a Postgres database. Today let’s dig into knowing if your Postgres database is read heavy or write heavy.
Of course write heavy or read heavy can largely be inferred from your business logic. Social media app - read heavy. IoT logger - write heavy. But …. Many of us have mixed use applications. Knowing your write and read load can help you make other decisions about tuning and architecture priorities with your Postgres fleet.
Understanding whether a Postgres database is read-heavy or write-heavy is paramount for effective database administration and performance tuning. For example, a read-heavy database might benefit more from extensive indexing, query caching, and read replicas, while a write-heavy database might require optimizations like faster storage, efficient WAL (Write-Ahead Log) management, table design considerations (such as fill factor and autovacuum tuning) and careful consideration of transaction isolation levels.
By reviewing a detailed read/write estimation, you can gain valuable insights into the underlying workload characteristics, enabling informed decisions for optimizing resource allocation and improving overall database performance.
Read and writes are not really equal
The challenge here in looking at Postgres like this is that reads and writes are not really equal.
- Postgres reads data in whole 8kb units, called blocks on disk or pages once they’re part of the shared memory. The cost of reading is much lower than writing. Since the most frequently used data generally resides in the shared buffers or the OS cache, many queries never need additional physical IO and can return results just from memory.
- Postgres writes by comparison are a little more complicated. When changing an individual tuple, Postgres needs to write data to WAL defining what happens. If this is the first write after a checkpoint, this could include a copy of the full data page. This also can involve writing additional data for any index changes, toast table changes, or toast table indexes. This is the direct write cost of a single database change, which is done before the commit is accepted. There is also the IO cost for writing out all dirty page buffers, but this is generally done in the background by the background writer. In addition to these write IO costs, the data pages need to be in memory in order to make changes, so every write operation also has potential read overhead as well.
That being said - I’ve worked on a query using internal table statistics that loosely estimates read load and write load.
Query Postgres for read and write traffic
This query leverages Postgres’ internal metadata to provide an estimation of the number of disk pages (or blocks) that have been directly affected by changes to a given number of tuples (rows). This estimation is crucial for understanding the read/write profile of a database, which in turn can inform optimization strategies (see below).
The query's logic is broken down into several Common Table Expressions (CTEs) to enhance readability and modularity:
ratio_target CTE:
This initial CTE is designed to establish a predefined threshold. It allows the user to specify a target ratio of read pages per write page. This ratio serves as the primary criteria for classifying a database or table as either read-heavy or write-heavy.
I’ve set the ratio in the query to 5 reads : 1 write, which means that roughly 20% of the database activity would be writes in this case. This is a bit of a fudge factor number and the exact definition of what makes up a write-heavy database may differ. If you set to 100, it would consider 100 reads to be equivalent to 1 write, or 1%; this is to allow you to tweak the definitions here for the classifications.
By defining this threshold explicitly, the query provides a flexible mechanism for evaluating different performance characteristics based on specific application requirements. For instance, a higher ratio_target might indicate a preference for read-intensive operations, while a lower one might suggest a workload dominated by writes.
table_list CTE
This CTE is responsible for the core calculations necessary to determine the read and write page counts. It performs the following key functions:
Total read pages:
It calculates the total number of pages that are typically read for the tables under consideration. This metric is fundamental to assessing the read demand placed on the database.
Estimated changed pages for writes:
To estimate the number of pages affected by write operations, the table_list CTE utilizes the existing relpages (total pages) and reltuples (total tuples) statistics from the pg_class system catalog. By calculating the ratio of relpages to reltuples, the query derives an estimated density of tuples per page. This density is then applied to the observed number of tuple writes to project how many physical pages were likely impacted by these write operations. This approach provides a practical way to infer disk I/O related to writes without needing to track every individual page modification.
Final comparison and classification
After the table_list CTE has computed the estimated read pages and write-affected pages, the final stage of the query involves a comparative analysis. The calculated number of read pages is directly compared against the estimated number of write pages. Based on this comparison, and in conjunction with the ratio_target defined earlier, the query then classifies each table (or the database as a whole) into one of several categories. These categories typically include:
- Read-heavy: This classification is applied when the proportion of read pages significantly outweighs the write pages, based on the defined ratio_target.
- Write-heavy: Conversely, this classification indicates that write operations are more prevalent, with a higher number of write-affected pages relative to read pages.
- Other scenarios: The query can also identify other scenarios, such as balanced workloads where read and write operations are roughly equivalent, or cases where the data volume is too low to make a definitive classification.
The read/write Postgres query:
WITH
ratio_target AS (SELECT 5 AS ratio),
table_list AS (SELECT
s.schemaname,
s.relname AS table_name,
-- Sum of heap and index blocks read from disk (from pg_statio_user_tables)
si.heap_blks_read + si.idx_blks_read AS blocks_read,
-- Sum of all write operations (tuples) (from pg_stat_user_tables)
s.n_tup_ins + s.n_tup_upd + s.n_tup_del AS write_tuples,
relpages * (s.n_tup_ins + s.n_tup_upd + s.n_tup_del ) / (case when reltuples = 0 then 1 else reltuples end) as blocks_write
FROM
-- Join the user tables statistics view with the I/O statistics view
pg_stat_user_tables AS s
JOIN pg_statio_user_tables AS si ON s.relid = si.relid
JOIN pg_class c ON c.oid = s.relid
WHERE
-- Filter to only show tables that have had some form of read or write activity
(s.n_tup_ins + s.n_tup_upd + s.n_tup_del) > 0
AND
(si.heap_blks_read + si.idx_blks_read) > 0
)
SELECT *,
CASE
-- Handle case with no activity
WHEN blocks_read = 0 and blocks_write = 0 THEN
'No Activity'
-- Handle write-heavy tables
WHEN blocks_write * ratio > blocks_read THEN
CASE
WHEN blocks_read = 0 THEN 'Write-Only'
ELSE
ROUND(blocks_write :: numeric / blocks_read :: numeric, 1)::text || ':1 (Write-Heavy)'
END
-- Handle read-heavy tables
WHEN blocks_read > blocks_write * ratio THEN
CASE
WHEN blocks_write = 0 THEN 'Read-Only'
ELSE
'1:' || ROUND(blocks_read::numeric / blocks_write :: numeric, 1)::text || ' (Read-Heavy)'
END
-- Handle balanced tables
ELSE
'1:1 (Balanced)'
END AS activity_ratio
FROM table_list, ratio_target
ORDER BY
-- Order by the most active tables first (sum of all operations)
(blocks_read + blocks_write) DESC;
Results will look something like this:
schemaname | table_name | blocks_read | write_tuples | blocks_write | ratio | activity_ratio
- -----------+---------------+-------------+--------------+--------------+-------+------------------------
public | audit_logs | 2 | 1500000 | 18519 | 5 | 9259.5:1 (Write-Heavy)
public | orders | 8 | 4 | -0 | 5 | Read-Only
public | articles | 2 | 10 | 1 | 5 | 0.5:1 (Write-Heavy)
public | user_profiles | 1 | 3 | -0 | 5 | Read-Only
pg_stat_statements
Another way to look at read and write traffic is through the pg_stat_statements extension. It aggregates statistics for every unique query run on your database. It also will collect data about Postgres queries row by row.
While the above query accounts for a bit more distribution in workload, pg_stat_statements is also a good checkpoint for traffic volume.
SELECT
SUM(CASE WHEN query ILIKE 'SELECT%' THEN rows ELSE 0 END) AS rows_read,
SUM(CASE WHEN query ILIKE 'INSERT%' OR query ILIKE 'UPDATE%' OR query ILIKE 'DELETE%' THEN rows ELSE 0 END) AS rows_written
FROM pg_stat_statements;
cache_hits | disk_reads | rows_read | rows_written
------------+------------+-----------+--------------
27586 | 998 | 443628 | 30
(1 row)
Performance Tuning for High Write Traffic in Postgres
For write-heavy systems, the bottleneck is often I/O and transaction throughput. You're constantly writing to the disk, which is slower than reading from memory.
- Faster Storage: The most direct way to improve write performance is to use faster storage, such as NVMe SSDs, and provision more I/O operations per second (IOPS).
- Postgres 18 now has asynchronous I/O which should be more performant than traditional methods.
- More RAM: While reads benefit from RAM for caching too, writes also benefit from a larger shared_buffers pool, which can hold more dirty pages before they need to be flushed to disk.
- I/O burst systems: Many cloud based systems come with extra I/O out of the box, so looking at these numbers may also be helpful.
- Minimize Indexes: While essential for reads, every index needs to be updated during a write operation. Over-indexing can significantly slow down writes so remove unused indexes.
- Utilizing HOT updates: Postgres has a performance improvement for frequently updated rows that are indexed, so adjusting fill factor to take advantage of this could be worth looking into.
- Tune the WAL (Write-Ahead Log): The WAL is where every change is written before it's committed to the main database files. Tuning parameters like wal_buffers can reduce the number of disk flushes and improve write performance.
- Optimize Checkpoints: Checkpoints sync the data from shared memory to disk. Frequent or large checkpoints can cause I/O spikes. Adjusting checkpoint_timeout and checkpoint_completion_target can smooth out these events.
Performance tuning for read traffic
For read-heavy systems, the primary goal is to get data to the user as quickly as possible and ideally have much data in the buffer cache so it is not reading from disk.
- Effective Caching: Ensure your shared_buffers and effective_cache_size are configured to take advantage of available RAM. This lets Postgres keep frequently accessed data in memory, avoiding costly disk reads.
- Optimize Queries and Indexes: Use EXPLAIN ANALYZE to pinpoint slow SELECT queries and add indexes on columns used in WHERE clauses, JOIN conditions, and ORDER BY statements. Remember, indexes speed up lookups at the cost of slower writes.
- Scaling out with read replicas: A read replica is a copy of your primary database that's kept in sync asynchronously. All write operations go to the primary, but you can distribute read queries across one or more replicas. This distributes the read load, offloads traffic from your primary server, and can dramatically improve read throughput without impacting your write performance.
Most Postgres databases are read heavy
Most Postgres databases are going to be far more read heavy than write heavy. I estimate just based on experience that 10:1 reads to writes is probably something where it is starting to get write heavy. Of course, there are outliers to this.
The right scaling strategy depends entirely on your workload. By proactively monitoring your Postgres stats using internal statistics in the Postgres catalog, you can make informed decisions that will keep your database healthy and your application fast.
Co-authored with Elizabeth Christensen
Related Articles
- Is Postgres Read Heavy or Write Heavy? (And Why You Should You Care)
10 min read
- PostGIS Performance: Indexing and EXPLAIN
3 min read
- Postgres Migrations Using Logical Replication
7 min read
- Postgres 18: OLD and NEW Rows in the RETURNING Clause
2 min read
- Postgres’ Original Project Goals: The Creators Totally Nailed It
9 min read