Postgres Hosting Checklist
A list of considerations for choosing a Postgres hosting vendor.
The Postgres Software
Ensuring that your Postgres is actually Postgres is vital for ensuring the flexibility you wish to have from Open Source software.
Using open-source Postgres?
- Is it actually Postgres, or something "Postgres-compatible"? PostgreSQL has had 30 years of active development, with a strong reputation for reliability, feature robustness, and performance. It is very different to be using Postgres itself and something that is "compatible", particularly when things go wrong, or require optimizations. Read the fine print. There are no standards for what is "Postgres compatible".
Major version support
- Postgres releases major versions on a yearly cycle, unlocking significant performance and functionality improvements. How soon does your provider make new Postgres major versions and all their improvements and fixes available to you? Those offering "Postgres-compatible" databases take close to a year, and often stale at a specific version because they have to backport the changes.
Minor version availability
- Being open-source, Postgres maintainers acknowledge and release security and data integrity releases promptly. While these are rare-occurrences, it is important to patch them quickly.
Is your team comfortable diagnosing Postgres performance? Have you run large databases before? Are you a consultancy with clients, and want to offload the database relationship?
Postgres focused v. infrastructure focused
- Support often costs extra. Are you getting support for Postgres or for the provider's product? Infrastructure vendors offer support, but their support is for the underlying host, not for the database, which is what your application uses. Often, their recommendation is to upgrade to a larger instance, at extra cost.
Support time to response
- Support SLAs may or may not be available for different vendors. SLAs apply to customer support response time, not resolution time. Is the support response time applicable for the use-case?
Extensions are how Postgres can incorporate additional functionality before it is included in the main-branch.
- Does the vendor provide the extensions that you require? For scale, one of the most powerful extensions is PGPartman, which makes using table partitions simple. For mapping and GIS, Postgis unlocks many features. Checkout the extensions available so that you know what tools you have.
- Does the vendor offer communications about potentially including an extension? Will the vendor discuss the potential of including the extension, or is it an automatic "no".
- Extension updates are released quite rapidly, often for functionality, performance, or security needs. Extensions being out of date can be as big a liability as Postgres being out of date.
Being in the cloud, systems fail, and when that happens, what is the vendor's role in helping you recover?
Fully-Managed v. Self-Managed
- Fully-managed means that the database vendor's team will assist in recovery should a host fail. Self-managed means you are stuck with self-monitoring and a restart button on a panel hoping the host recovers quickly.
Integration with 3rd Party Monitoring
- Postgres monitoring requires a high-level of access to the host, which not all vendors provide. Thus, leaving 3rd party tools monitoring the nice-to-haves and not the must-haves..
- Some providers offer a minimal set of monitoring statistics for a few days, charging extra for metrics that actually help your operations and application teams. Make sure you understand what is included and what is extra, so you know your total cost.
A connection pooler accepts connections from client applications, while maintaining persistent connections to the underlying Postgres servers. It reduces stress on database compute and memory resources that tipically occur when new connections are established. It increases application availability by preserving application connections and allows for flexibility in high availability and disaster recovery scenarios.
Support for pooling using the Postgres extended query protocol
- The extended query protocol is best practice, as it is more secure and performant. Some connection poolers permanently "pin" connections using the extended query protocol to a single server, greatly negating the benefits of a pooler.
Connection Pooler Costs
- Does your provider charge extra for a connection pooler? Do they charge per vCPU, for traffic? Do they charge more for endpoints?
Connection Pooler Upgrades
- Does your provider's connection pooler allow pooler version upgrades without having to drop existing connections? Does it closely follow Postgres versions and protocol upgrades?
API & Tools
The tool suite around the database can make-or-break your ability to integrate fully with your operations.
- APIs come in all flavors: SOAP, REST, simple and complex. An API is more than a checkbox on a questionnaire: it must be tested. Building a good API experience requires intention from the vendor.
- API documentation
Command Line Interface (CLI)
- CLIs have been a growing force since the cloud has grown because experienced developers can do more with the command line than a UI. For instance, tailing a log file with a CLI empowers use of all types of additional functionality not found in a GUI.
- APIs should empower all the functionality of a database: create, destroy, health checks, backup, restore, logs, monitoring, and more. How programmable is the database environment?
People who have debugged database issues know to jump straight to the logs when there is a question. The logs hold the clues.
- If using a 3rd party log-aggregator, ensure the vendor has the ability to ship logs to the specific 3rd party. This helps with additional log parsing automation that may be built.
- We are slow now, what is the database saying? When things are slow now, real-time log tailing is a valuable tool.
Logs within the platform
- Databases are often configured today and debugged later. This means 3rd party logging may not be configured for the performance issues from yesterday. But, if logs are available in the platform, it will always have your back.
Role based permissioning inside native Postgres is superuser, then everyone else.
- Does the provider offer real superuser or a limited pseudo-superuser? In conjunction with unknown "Postgres-compatible" services, you have limited access to a database software no one else outside the provider has access to, or has tested.
Limitations of access
- For security purposes, Postgres hosting vendors put in limitations to access. Knowing the explicit limitations are important for future growth.
Backup & Restore
Those are not simple operations, and when you need them, you REALLY need them to be correct and to work. There are many caveats to proper backup & restore, and not all of them are immediately obvious.
- Postgres backups can be snapshots from the Postgres data directory, exports to SQL (pgdump), or using transactional logs (WAL). Point-in-Time-Restore (i.e. restore database to 5pm yesterday) requires a combination of snapshots and transaction logs. Knowing how backups happen is important for understanding capabilities.
- Are there additional costs for backup storage? Often hosting providers charge additional fees for backups or to extend the retention period of the backups.
- Some retention times are required by regulation, and others are required for sound business practices. Check that the hosting provider maintains retention of snapshots and WAL files for the use case.
Access to backup files
- Just because the hosting vendor backs up the files does not mean customers have access to the files. Without access to these backup files, customers are stuck using pg_dump, which means restoring in case of disaster will be slow, and very limiting in functionality.
- Restore processes are based on capabilities of the vendor. Typical questions are: do restores happen to new hosts or overwrite the existing host? Is point-in-time-restore actually possible? Restore to a different region?
High-availability functionality is unique to the Postgres hosting vendor. Some use off-the-shelf tools, which limit tunability.
- There are a number of HA systems for Postgres. How quickly will the HA system respond and attempt to automatically recovery the cluster? Make sure you test recovery often and have statistics on timing.
- Postgres can replicate using streaming, logical, or statement-based replication, each having pros and cons. Which does the hosting provider use and does that fit your needs? Streaming replication is a high-integrity system for replication that checks commits across the cluster against an internal timeline.
- Does the hostname change? Does the failover system have the same system resources as the initial host? When building systems that rely on the database, it is important to program your secondary systems to behave properly as well.
With the cloud, high-availability is no longer a disaster because of the advanced systems. Thus, disasters are client-side issues, like dropping a database, or vendor-side issues, like a data-center failing.
Disaster recovery process
- Will the vendor assist with client-side failures or only on vendor-side failures? If so, what is the initiation process and the SLA for a response time.
Customer-facing tools for client-side failures
- Most client-side failures can be resolved with point-in-time-restore (i.e. restore to the time before the failure). Does the vendor include API programmability to take snapshots of your database prior to large migrations?
Ability to Migrate
Not all vendor relationships work. Many database hosting vendors make it super easy to import data, but export data is a downtime event.
Open Source Postgres
- Migrating from a Postgres-compatible database to an Open Source Postgres database is not as easy as it sounds. Thus, know there probably is not a reverse switch.
- For large databases (greater than 100GB), it is typically best to replicate a database using streaming replication or access to WAL files. This enables minimal downtime migrations. If the vendor claims pgdump is the only migration path for larger databases, migration will require significant downtime.