TL;DR: keep your data-layer simple, you'll need the agility it offers when budgets get stretched.
Over the past 12 years of helping people run databases for their businesses, I have discovered that there are only 2 financial phases in the lifecycle of a database:
- Phase 1: Absolute costs are low, thus marginal costs do not matter.
- Phase 2: Absolute costs are high, thus marginal costs are all that matter.
I'm calling this difference between Phase 1 and Phase 2 the "database financial gap."
The first hint there is a problem comes from questions about optimizations. Developers & DBAs have questions about indexes, RAM, connection management, schema design, sharding, and many more technical ideas. Many think they are solving technical problems, but are solving a money problem.
Why is it a money problem? If money is not the problem, then throw hardware at the problem. Yet, because they seek to optimize, it indicates they are choosing not to throw hardware at the problem.
This is the honeymoon phase of running a database. Costs of running a database are insignificant1 for the business. The situation in this phase is:
- Small number of application & background processes
- Data sizes are small
- Peak per-second queries & writes are in the low double digits
- Hardware overpowers the use-case
- A single-node can handle the usage
Mistakes during this phase are not terminal. Common mistakes include:
- Queries can run without indexes.
- Misoptimization of data-structures are insignificant2
- Table-scans are not hammering the iOPS because the number of rows is relatively constrained.
- Connection pooling does not matter because there are limited application processes.
- ORMs have not led you astray with sub-optimal queries3
During this phase, very few things can increase the costs too high to cause pain.
I've seen this phase end with
Accrued technical debt of database decisions has a chance of becoming actual debt. Now, it's a race between knowledge of database interaction 4, ability for hardware to overcome poor choices, and money. Knowledge, hardware, and money are the 3 levers.
Optimizations come in many flavors from queries to connections pooling to schema design to sharding. Knowledge is typically slower to execute than hardware or money -- particularly internal knowledge. The process is bound by iterating over the following:
- Amount of time it takes to understand the problem, and form a hypothesis
- Amount of time it takes to implement the change
- Amount of time it takes to confirm the change performed
- The impact of the change. Did it solve the entire problem? The likely answer is "no", because it's never just one problem. Worse case scenario, it's an adverse impact5.
All Crunchy Data customers have access to expert support to augment these decisions. We have seen enough databases to get you to the correct answer faster.
In 2022, we have a plethora of hardware options at affordable prices to overcome poor decisions. SSDs are incredibly performant and reasonably priced. RAM is as cheap as ever. CPUs have more cores than ever, which helps on reads, but not on writes.
So, when all else fails, throw hardware at the database. The ultimate outcome of throwing more hardware at it are:
- Buy time to implement the optimization changes
- Hardware becomes too expensive for the businesses
- No single hardware is large enough to support scale of database6
Hardware is always an option. The first time hardware is upgraded, it is probably the right choice. After upgrading the hardware a few times without optimizing, it's best to see if there are some obvoius optimizations that can be made. Check out Demystifying Database Performance for Developers or Postgres Indexes for Newbies.
Money is used to buy hardware and knowledge. Bigger hardware is buying time. Contracting a consultant to point to potential changes to your codebase is buying knowledge.
If you only have $1 to spend, do you spend it on hardware or knowledge? Use Crunchy Bridge, our hosted Postgres, and get both.
Fixing one problem at this phase often exposes the next problem. Some compare it to "wack-a-mole", but I compare it to playing in a stream of water. When releasing water from one blockage, increased water flow puts stress on the next blockage. Certain parts of a system may see healthy until they are exposed to full demand.
When moving from Phase 1 to Phase 2, the greatest tool will be the ability to change quickly and inexpensively. We'll call this "data agility".
Why keep it simple stupid? The fastest change in technology is changing from the simple to the complex. I've seen many companies make the jump from Phase 1 to Phase 2. Those that move the quickest are the ones with the simplest stack to start.
How is native SQL agile?
Change how data is stored
Tools in the SQL bag: add / remove / combine a field, optimize a field, partition a table, denormalize a field, shard across database hosts application sharding, split out use cases into databases that handle the job better, like caching / queueing
Change how data is retrieved
Tools in the SQL bag: add / remove / combine indexes, rebuild query to better use indexes (EXPLAIN), refactor expensive queries from ORM to native SQL
Measurement & Observability
Tools in the SQL bag: slow-query logging, CPU / iOPS / RAM usage, application level monitoring7
Application interactions with database
Tools in the bag: batch inserts, PGBouncer, serialize background processes, specialized API for certain tasks
The team at Crunchy are experts at scaling with respect to costs. We are experts because we have made the jump ourselves, and helped many companies make the jump from Phase 1 to Phase 2. Our products (Bridge, Kubernetes, and Postgres HA) encapsulate that knowledge into the toolsets so that the problems are either avoided or quickly mitigated.
The Crunchy stack is built to help you maximize performance-to-cost.
Different businesses have different measures of "insignificant", so it may be
500, or 50k+ (if it does, we would love to talk to you). ↩
Mis-optimization of data structures is like using relational schema in a non-relational database, implementation of caching before caching is really necessary, sharding before understanding the proper shard structure. ↩
I am actually a fan of ORMs in nearly all situations, but that is for another blog post. ↩
I say "database interaction optimizations" because a database is never inherently slow, nor fast. It just is. The "slow" or "fast" comes from the interaction of the application with the capabilities and constraints of the database. ↩
Impact of Δ is a totally unbounded negative or positive number. I have seen "improvements" grind a database to a halt. Such improvements are typically of the vein "increase the number of app runners" or "add indexes to every field". ↩
No single hardware is large enough to support scale of database. That phrase is limited to a very small set of databases. If it does match your company, we would love to help you! ↩
How should I measure my database performance? The top-line measure of database performance is measuring your application performance. Every application has a different SLA (stated and unstated), and getting close to that SLA is the best understanding of performance of the stack. ↩
June 13, 2022 •More by this author