Last week I was on a call with someone giving an overview of Crunchy Bridge, our multi-cloud fully managed database as a service. During the call they asked about what was the best way to get a sense of how their database was doing, a health check if you will. I mentioned a few monitoring tools such as pganalyze and pgMonitor that were good options, but that we could also dig in live and see how things were.
The person I was talking to was very similar to many of the customers we run databases for - an application developer, without a full time DBA on staff. They needed a little extra guidance and support and they were outgrowing their existing hosting provider, so we dug in to do a bit of spring cleaning and assessment on their database.
While I’m generally happy to spend a few minutes with anyone interested to talk about how Crunchy could help and to dig into your database with you, I’m also a fan of self-servicing. Here’s a walk through of each of the pieces we looked at (all within a few minutes) to help give some actionable steps on how to improve their database performance.
The first thing you're going to want to look at is your cache hit ratio and index hit ratio. Your cache hit ratio is going to give the percentage of time your data is served from within memory vs. having to go to disk. Generally serving data from memory vs. disk is going to orders of magnitude faster, thus the more you can serve from memory the better. For a typical web application making a lot of short requests I'm going to target > 99% here.
sql SELECT 'index hit rate' AS name, (sum(idx_blks_hit)) / nullif(sum(idx_blks_hit + idx_blks_read),0) AS ratio FROM pg_statio_user_indexes UNION ALL SELECT 'table hit rate' AS name, sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read),0) AS ratio FROM pg_statio_user_tables;
In their case, the cache hit ratio looked good and healthy. Had it been something like 80% or 90%, the solution is simple: get your database more memory by upgrading to the next step up.
From here we went ahead and took a look at the indexes and how often they're used. When you query a database the default behavior is going to be: scan the entire table to find records that match. This works fine when your table is 10 records, but for any sizable application scanning millions of records is simply not feasible. It is likely you've added some indexes as you've gone along to speed things up. In the below query you'll see how often you used an index when querying each respective table:
sql SELECT relname, CASE idx_scan WHEN 0 THEN 'Insufficient data' ELSE (100 * idx_scan / (seq_scan + idx_scan))::text END percent_of_times_index_used, n_live_tup rows_in_table FROM pg_stat_user_tables ORDER BY n_live_tup DESC; relname | percent_of_times_index_used | rows_in_table -----------------------+-----------------------------+--------------- account | 11 | 5409 deploy | 69 | 58276 app | 93 | 5345 team | 98 | 1801 firewall_rule | 9 | 984 ...
Personally I like use a rough guideline of:
- Rows with greater than 1000 or 10000 records
- Percentage of index used greater than 90%
Here I've got a couple of things that would be worth digging in further to
adding some indexes. The first is the
account table. While the
only has 5,409 records it is likely to continue growing and is clearly missing
some needed indexes since only 11% of the time an index is used when querying
that table. The other table of interest is the
deploy table. As we can see the
deploy table is the largest one. The
deploy table has some indexes as they
are used 69% of the time, but some workflow in the table is querying and not
having an index that helps.
Once we saw that their indexing was in good shape we shifted to what we could clean-up. As an application grows and matures you likely have some things you did early on that are now and unnecessary tax. The thing to look for here is your unused indexes-these are indexes that exist within the database but your queries are not utilizing them. This means you're getting slower throughput on writes with well no benefit from the index existing.
sql SELECT schemaname || '.' || relname AS table, indexrelname AS index, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, idx_scan as index_scans FROM pg_stat_user_indexes ui JOIN pg_index i ON ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 AND pg_relation_size(relid) > 5 * 8192 ORDER BY pg_relation_size(i.indexrelid) / nullif(idx_scan, 0) DESC NULLS FIRST, pg_relation_size(i.indexrelid) DESC;
While you want to go through removing these indexes with caution you’re usually quite safe to remove these extra indexes that have no usage and are simply slowing down your throughput.
One very common habit when you are starting out is to put everything in your
database. This isn’t all bad, Postgres is quite versatile and can
handle a lot of what you can
throw at it. But some very specific tables often can make sense in other data
stores, if you have a table called
events in your Postgres
database there is a good chance it’s consuming the bulk of your storage, is your
largest table, and could benefit from being moved outside of your main Postgres
I’ve said for many years pg_stat_statements is one of the most useful extensions for Postgres. It records queries that are run against your database and a number of stats about them. It is a great place to go once a month or so and look up what queries are slowest on average as well as ones that consume the most total time against your system.
We’ll dig into
pg_stat_statements more in the future, but for now if you want
a quick look this query can give you a preview:
SELECT (total_time / 1000 / 60) as total_minutes, (total_time/calls) as average_time, query FROM pg_stat_statements ORDER BY 1 DESC LIMIT 50;
You don’t need to understand how MVCC underlies Postgres, or how to determine during a failover that it is safe to failover to have a well tuned Postgres database. Much of the tuning and improvements that can happen for your database aren’t the result of transaction logs and linux huge page tuning, rather it can be identifying N+1 queries, or just paying attention to wasteful things like “logs” consuming space and not providing value. Walking through the above, without advanced knowledge of Postgres should give you a good starting point to clean up.
And as always if you do find yourself wanting extra support and guidance in your Postgres journey, we’re always happy to chat here @crunchydata.
February 2, 2021 •More by this author