Connection pooling and management is one of those things most people ignore far too long when it comes to their database. When starting out, you can easily get by without it. With 1 or 2 application servers spawning 5-10 connections, even the tiniest of Postgres servers can handle such. Even with our $35 a month hobby plan on Crunchy Bridge, we can push 5,000 transactions per second through which is quite a bit for < 20 connections.
As you grow into the hundreds, better connection management is a quick and easy win. Most think you only need a connection pooler at massive scale when you're scaling reads beyond a single node, but the best time is far before that. Let's dig into the three variations of connection pooling and how to identify if you can benefit from a connection pooler and where.
Most web frameworks these days support server multiple web requests at a time; they do this via multiple threads or workers. Each backend of your web server serving up a request may need a database connection. So if you have a single web server running, but have 2 x number of cores (say 4) workers running, you'll have 8 backends at a time attempting to connect to your database.
Each new connection to your database is going to take some time to establish, often because of TLS negotiation. By connecting fresh each request, you may be spending 50ms connecting to your database to execute a 1ms query (lots of wasted time there). So the result is to tweak your database connection library to "persist" connections or maintain a pool of them. Each library differs a bit here, but regardless, having some standard pool of connections already open and connected from your application framework can help in overall performance per request.
But this where adding one solution introduces another problem.
With the above application side pooling in place you may be fine with one or two web servers. But when you scale to 20 web servers, you may now have 200 connections open to your database. 99% of the time these connections are sitting there idle not doing anything. Those wasted connections have some extra overhead in coordination each time a query is executed. But worse yet, they sit and consume wasted memory, which is one of the most valuable resources for your database.
A connection pooler that sits on a server near your database, or on the same server in front of your database can help on these idle transactions. A connection pooler like pgBouncer to your application looks exactly like Postgres. But sits between Postgres and your database and does the heavy lifting of giving out connections as they're needed. So your application issues a request to get a connection, pgBouncer says sure here you go, but pgBouncer will not pass the connection to the actual Postgres database until you start to execute a query.
The best way to tell if you would benefit from pgBouncer is by checking for idle connections in your database. If you're in the high 10s or if you have more idle than active connections it can be a big benefit:
SELECT count(*), state FROM pg_stat_activity GROUP BY 2; count | state -------+------------------------------- 7 | active 69 | idle 26 | idle in transaction 11 | idle in transaction (aborted) (4 rows)
Often when people hear connection pooling they think: scaling out your reads across multiple databases. Here you have some routing that will send reads to a particular read replica while writes continue to go to the primary. There are some Postgres tools that help with this (pgPool), but they tend to come with a bit of expert usage required.
Increasingly, this is being built into more and more frameworks. Django for example, doesn't have this turnkey but you can specify routing for various models in a pretty straightforward way. Rails has some libraries that help with routing via specific models.
A few tips for when/if you do need to go down this path:
- Consider doing it within the application/framework level.
- Try to group reads by model as much as possible (this will keep more of the same data in hot memory).
In scaling your application the database is one area where there are often 10-15 basic steps you can take to get to a much larger scale. Connection pooling both in the application side and server side via pgBouncer are one such step you should always take for production applications.
We're excited to support built-in connection pooling with pgBouncer on Crunchy Bridge, as a fully managed database as a service. We want to take care of all the headache of maintaining a database for you. And within Crunchy Postgres, whether you are running on bare metal VMs or leveraging our Kubernetes Operator, pgBouncer is available for you there as well. In short, we use pgBouncer and you should too.
April 16, 2021 •More by this author