Thinking Fast vs. Slow with Your Data in Postgres

Today, we wanted to address some basic principles for better managing data architecture. Postgres is well regarded as a database for traditional system of record. More recently we've been fielding questions on what else can it do, such as: Can it be good for analytics and metrics? The short answer is "yes". When applications expand outside their standard system of record, they add in new types of data and data stores, which introduces complexity managing multiple types of systems.

Some common workloads for Postgres are:

Primary system of record
Metrics / analytics data
Log / event data

By knowing the toolset, Postgres gracefully handles versatile workloads. We've talked a lot about how flexible it is in terms of features. By combining features, and the database is equally flexible for multiple use cases.

We use Postgres to power daily performance metrics system for Crunchy Bridge. Postgres works quite well for this, as I laid in the Twitter-thread above. The details of the Crunchy Bridge's metrics system are:

Ingesting 50M events daily, without breaking much of a sweat
Currently a basic metrics schema with Postgres partitioning, and a few key indexes
In the future we'll do aggregations that leverage pg_cron and (possibly) hyperloglog
By running the metrics data on a different cluster, differing workloads are not competing for the same resources
Because it's all "just Postgres", we connect metrics into our system of record with a foreign data wrapper

Using simple, built-in tools avoids data premature optimizations, but knowing the next steps allows us focus on continued investment in features that make a positive impact for our customers.

Let's look at the three workloads we call out above to see what's unique / different about them.

Primary system of record

A "system of record" is the typical design for the specifics of your application. If you are building a CRM, the data consists of accounts, contacts, and opportunities. If, like Crunchy Data, you are building a DBaaS, the data tables of accounts, teams, clusters, and networks. The characteristics of system of record is:

Smaller working set of data that often can fit in memory
Support small, fast queries, with single digit millisecond or less
Ensure data consistency using primary key, foreign keys, and data validity constraints

Metrics/analytics system

With metrics or analytics systems you're going to ingest a larger amount of data. Typical characteristics of these systems are:

Consistently growing write volume
Larger amount of data is unlikely to fit entirely in memory
Customer interactions require responsive queries
Older data is not read as frequently as new data

For systems that have time to mature, large ingest systems have a few stages:

Start with raw inserts
Move to multi-row inserts
Micro-batch with copy

Beyond the ingestion, querying raw-data over time will become slower and slower as the data grows. To keep the read queries performant, use:

Partitioning so you don't have to always query the entire dataset
Limited number of targeted indexes, like customer_id and event_type
Rollups (saving the output of queries) to precompute the views that most matter to your customers

Logging and event tracking in Postgres

Logging and event tracking is typically less user facing, and serves the purpose of an audit log. This data represents a full record of all your logs/events that happen within the system. The characteristics of this data are:

Similar ingest requirements as metrics / analytics, however
Query pattern can be more complex, and
Query responsiveness can be slower
Budget requirements target data at cheaper durable storage
A new use-case with this data requires flexibility for a new index

Architecting your data

Can Postgres work as a tool for X? Yes. Postgres is a great tool that is powerful and flexible. Separate workloads on different clusters, then, use the Postgres Foreign Data Wrapper to re-connect them. It avoids premature optimizations in many cases, and can be trusted to scale as you need.

Latest Articles