Postgres wasn't always the cool kid. It didn't emerge from millions of dollars of VC funding, it didn't have a team of DevRel evangelists championing it, it simply started and evolved. Postgres just passed the 25 year milestone as an open source project. With the upcoming release of Postgres 16 in coming months, we thought we'd pause for a minute and take a look back at Postgres and how it got to where it is today.
If for some reason you've been living under a rock and are unfamiliar with Postgres, give a read on our write-up of Why Postgres. Or if you simply want the bullet points:
- Reliable and safe for your data
- Extremely feature rich
- No central owner means no lock-in
- More than "just a database", it's a data platform
Postgres’ origin roots started with the Ingres project. It originally started at the University of California Berkeley as Ingres (INteractive Graphics REtrieval System) under the direction of Michael Stonebraker. They changed courses and funded a relational database project instead.
The original language was QUEL, not even SQL. ANSI officially set a preference for SQL in 1986 and many of the relational projects moved that direction. SQL support was added in 1995 when Postgres95 was released.
Development moved out of academia in 1996 when the first version of PostgreSQL was released as 6.0 and the PostgreSQL Global Development Team was established, which is the same leadership group currently heading the project.
The foundational MVCC that makes Postgres what it is was released in 1999 with Postgres 6.5.
Today there are a lot of things you take for granted when working with a reliable database. In the year 2000, Postgres began its journey of growing up. The press release (if they did those things back then) might have read: Postgres JOINS other database ranks with foreign keys and join support.
The rest of the early 2000s brought more key foundational pieces, with a primary focus on being a reliable database and SQL, you got features such as:
- Write-Ahead-Log (WAL)
- Outer joins
- More than 4b transactions
- Drop column
- IPv6 Postgres was early on this one 😉
One of my favorite things in observing some of the details from back in 2000 is some of the names that contributed that are still pillars in the Postgres community today. These folks range from my colleague Tom Lane, to Josh Berkus, Bruce Momjian, Hiroshi Inoue and Peter Eisentraut.
But it’s not just the commits, show up at a PgCon or PGConfEU and you’ll see some of the very same people pictured here at the PostgreSQL anniversary summit in 2006 (so almost 20 years ago) including Magnus Hagander, Stephen Frost, Bruce Momjian, Devrim Gunduz, Oleg Bartunov, David Fetter, Robert Treat, Simon Riggs, Joe Conway, Tom Lane, Andrew Dunstan, Dave Page, Jean-Paul Argudo, and so many many more (and to anyone I missed, personal apologies you can point out I missed you in person at the next conference).
Ned Lilly said, "Everyone who is involved with Postgres feels really good about its prospects. It's not much of a stretch for them to [think] there's a market opportunity here. Oracle is monolithic, so it's not a leap to visualize [success]. The open-source database wave is starting to crest."
The above statement was made back in 2000, in association with the first Postgres company Great Bridge. Ned got it right, but was he 5 years, 10 years, or 15 years too early? We will never know as it was a dotcom casualty, and not long after that statement that the company shutdown in 2001. And if you hadn’t noticed a slight nod in the naming of Crunchy Bridge to some PostgreSQL history, well now you know.
As we get later into the 2000s and hit 2005, Postgres could be viewed as a fairly reliable database. With richer transaction support, broad SQL support, and pieces like WAL and VACUUM improvements, if you were an early adopter you started to trust it for production workloads. It was trustworthy, but still had some ways to go in terms of ease of use.
At this point in time we start to see a mix of features with multiple different themes that could be pointed to:
- Concurrent index creation
- Warm standby servers
- Query language improvements
- All the data types - Arrays, UUID, ENUM, XML
- Two phase commit
- A richer role system
We still saw the same set of folks involved since early days, but others started to contribute headlining features. Some themes would emerge. As always more SQL support and better performance, embracing Postgres as an extensible database with new data types. Personally, I think the powerful reporting capabilities really made Postgres a game changer. In 2009 with Postgres 8.4 we got window functions and common table expressions (CTEs), with those in place I personally would never look back to another database.
With Postgres as a solid foundation, around this time we'd start to see its imprint on the broader data ecosystem. Because of the solid code base but also its permissible licensing, many companies took Postgres and began to fork it. In the early to late 2000s the first thing that most of these companies would do is add MPP support to Postgres so it could target more OLAP focused workloads. When you couple that with support for things like window functions and CTEs, you have something new and powerful without having to build it from the ground up. This type of product could shave years off the time it takes to mature as a database. Many of those original forks aren't still around today, but some live on in other products...
- Aster Data → Acquired by Teradata
- Truviso → Acquired by Cisco
- Netezza → Acquired by IBM
- Greenplum → Acquired by EMC
- ParAccel → never acquired but effectively became RedShift
Despite some proliferation of Postgres forks, Postgres kept doing what it had always done – just keep moving forward.
If you ask me, this is the start of
Postgres becoming cool. Support for things like listen/notify (pub-sub from your
database) and hstore (a key/value data type) really didn't fit the mold of
stodgy old relational databases. Upgrades finally became easier with support for
pg_upgrade. We started to get more than just your standard B-Tree index with
the arrival of GIN and GiST indexes.
Extensions were always part of Postgres but with some refactors in the integration they became much more readily available to users. And we saw the development of the Postgres foreign data wrapper so you could connect disparate Postgres databases.
Not just the old boring data types and columns and relations. Yet, all of it was built on the same ACID-compliant, trustworthy base.
If the world hadn't taken notice of Postgres by now you were in for a wake up call. With the big data wave starting to cool off, but the rise of NoSQL databases (Mongo and Couchbase), it was clear developers wanted a different way of working with their data. Postgres listened, but then sort of cheated with its JSON support. In 9.2 it was JSON validation, but thrown into a text field. In reality we’d have to wait two more years before credible JSON support inside Postgres.
But. That didn’t stop people from starting to take notice. With the rise of Heroku for easy app deployment, Heroku Postgres being the default database, the tides from shared hosting of MySQL databases and your VPS for apps to PaaS, and more dedicated database infrastructure were growing.
Postgres 9.3 was great and all, we got lateral joins, updatable foreign tables, checksums, and more. But 9.4, we got better JSON in the JSONB data type. This was a binary representation of JSON on disk, this meant GIN indexes could allow you to more easily index your data without having to index on very specific JSON functions. JSONB is still a data type and feature that blows minds when people first hear about it.
Postgres isn’t only for app developers wanting shiny features. Logical decoding laid the groundwork for future years of easier change data capture (CDC) in Postgres. Refreshing materialized views allowed for richer reporting use cases. Background workers enabled more functionality and creative use cases in particular for extensions.
It was around that 9.3 timeline that Amazon announced support at Re:Invent for PostgreSQL support on RDS. Having been at most Re:Invents and being in the room, it is the only one I can recall a complete standing ovation from the audience. The speaker was noticeably thrown off by the duration of the applause, this wasn’t Amazon announcing a new thing it was them caving in to constant customer requests.
At this point we started to get to the point where headline features were less common, instead we see a theme of steady performance improvements and a continuation to enhance existing features. JSONB got support for inline updates, and we started to see more parallel execution come in. But it wasn’t all small updates, depending on your needs there may have been a major feature in here that you’d been dying for. A few highlights include:
This marching forward over several year was still largely the result of individual contributors. We also saw some concerted efforts of companies investing in particular areas like EnterpriseDB (focusing on Oracle compatibility), 2ndQuadrant (focusing on replication), Postgres Pro (focusing on JSONB), and Crunchy Data (focusing on security and cloud native).
Postgres extensions are like libraries, they change Postgres behavior without contributing to the main code. Postgres had long shipped with contrib extensions, some of these were heavily used and adopted… hstore was the key/value store that people loved before JSONB arrived in Postgres. pg_stat_statements got some major improvements making it easily one of the most valuable extensions in Postgres.
Outside of contrib, PostGIS moved in parallel to Postgres, transforming Postgres into the world’s most powerful and feature rich geospatial database. It wasn’t just a one trick pony, it introduced new operators, new functions, and data types. For a while PostGIS was the highlight example, first we saw some clever new extensions arrive like HyperLogLog. Then we saw entire companies be built around this idea of extending Postgres, first Citus, then Timescale, and now a plethora of companies.
At its core maybe it has always been the same as it is today, a reliable database you can trust – after all isn’t that what you want in a database? And yet, other new shiny things may capture your attention for a short time only for you to come back to the reliable stable workhorse we’ve had for over 20 years.
Was it Heroku making it the default so every Rails developer just started with Postgres? Has to be given some credit.
Even today, people are still shocked to learn of the rich JSON support in Postgres, and it’s been there for over 10 years now.
The richness of its SQL support, the continued performance improvements and flexibility in how your work with your data can’t be dismissed.
My personal belief, had it not been for the combination of Heroku and JSON support there wouldn’t have been pressure for the hyperscalers to add support – had the hyperscalers not supported it you’d simply pick the more available and accessible thing.
But without the continued improvement on performance and features with a steady release cycle and focus on quality it would have never had the chance. To that the credit always goes to the developers and committers that contribute often thankless work for making Postgres the solid piece of software it is.
August 9, 2023 •More by this author