Introducing Crunchy Data Warehouse: A next-generation Postgres-native data warehouse. Crunchy Data Warehouse Learn more
Joe Conway
Joe Conway
This is the first in a series of blogs on the topic of using PostgreSQL for "data science". I put that in quotes because I would not consider myself to be a practicing data scientist, per se. Of course I'm not sure there is a universally accepted definition of data scientist. This article provides a nice illustration of my point. I do believe my credentials are such that no one can accuse me of term appropriation. Toward establishment of that end, this first installment is a walk down memory l...
Read MoreJonathan S. Katz
Jonathan S. Katz
Please Note: This post references an older version of the Crunchy Postgres for Kubernetes. See PGO Documentation for the latest version. The Crunchy Data team announced the latest release of our open source PostgreSQL Operator for Kubernetes 4.6 a few weeks back. So let's take a whirlwind tour of how we make it easy to run production-quality Postgres on Kubernetes. With this release, we included features to streamline management of the Operator, added security features, and extra system metric...
Read MoreDouglas Hunley
Douglas Hunley
Crunchy Data has recently announced an update to the CIS PostgreSQL Benchmark by the Center for Internet Security , a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This newly published CIS PostgreSQL 13 Benchmark joins the existing CIS Benchmarks for PostgreSQL 9.5, 9.6, 10, 11, and 12 while continuing to build upon the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG ). A CIS Benchmark is a set...
Read MoreJonathan S. Katz
Jonathan S. Katz
This post provides guidance for v4x. For the latest on PGO, GitOps and Helm installer, please see: https://github.com/CrunchyData/postgres-operator-examples/tree/main/helm In the previous article , we explored GitOps and how to apply GitOps concepts to PostgreSQL in a Kubernetes environment with the Postgres Operator and custom resources. The article went on to mention additional tooling that has been created to help employ GitOps principles within an environment, including Helm . While the m...
Read MorePaul Ramsey
Paul Ramsey
A surprisingly common problem in both application development and analysis is: given an input name, find the database record it most likely refers to. It's common because databases of names and people are common, and it's a problem because names are a very irregular identifying token. The page " Falsehoods Programmers Believe About Names " covers some of the ways names are hard to deal with in programming. This post will ignore most of those complexities, and deal with the problem of matching up...
Read MoreSteve Pousty
Steve Pousty
Today we are going to walk through some of the preliminary data shaping steps in data science using SQL in Postgres. I have a long history of working in data science , including my Masters Degree (in Forestry) and Ph.D. (in Ecology) and during this work I would often get raw data files that I had to get into shape to run analysis. Whenever you start to do something new there is always some uncomfortableness . That “why is this so hard” feeling often stops me from trying something new, but...
Read MoreKat Batuigas
Kat Batuigas
"I want to work on optimizing all my queries all day long because it will definitely be worth the time and effort," is a statement that has hopefully never been said. So when it comes to query optimizing, how should you pick your battles? Luckily, in PostgreSQL we have a way to take a system-wide look at database queries: • Which ones have taken up the most amount of time cumulatively to execute • Which ones are run the most frequently • And how long on average they take to execute Which ones ha...
Read MoreJoe Conway
Joe Conway
Recently I ran across grand sweeping statements that suggest containers are not ready for prime time as a vehicle for deploying your databases. The definition of "futile" is something like "serving no useful purpose; completely ineffective". See why I say this below, but in short, you probably are already, for all intents and purposes, running your database in a "container". Therefore, your resistance is futile. And I'm here to tell you that, at least in so far as PostgreSQL is concerned, those...
Read MoreKat Batuigas
Kat Batuigas
As a GIS newbie, I've been trying to use local open data for my own learning projects. I recently relocated to Tampa, Florida and was browsing through the City of Tampa open data portal and saw that they have a Public Art map . That looked like a cool dataset to work with but I couldn't find the data source anywhere in the portal. I reached out to the nice folks on the city's GIS team and they gave me an ArcGIS-hosted URL. To get the public art features into PostGIS I decided to use the "ArcG...
Read MoreJonathan S. Katz
Jonathan S. Katz
The desire to use Pod tolerations to schedule Postgres instances sometimes comes up around complex Kubernetes deployments. To address this feedback, we added support for tolerations to the 4.6 release of the Postgres Operator along with improvements to using node affinity . To use tolerations with PostgreSQL deployments, it helps to understand some of the mechanics behind several Kubernetes features to get the desired result of deploying PostgreSQL to a specific node group. Let's take a loo...
Read More