Tutorial Instructions

Partitioning

Partitioning use cases

Data lifecycle & cost management

The main benefit of working with partitions is helping with lifecycle management of data. Big data gets really expensive so archiving off data you no longer need can be really important to managing costs. Using partitioning to manage your data lifecycle means you’ll be rolling off data to an archive frequently, allowing you to drop/archive tables easily as that data no longer needs to exist in a certain database.

Performance: Another benefit the people often look for in terms of partitioning is query performance. Especially when your queries use the indexes or partitioning keys specifically. You can really speed up query times by having things go straight to individual date ranges instead of the entire dataset.

Types of partitioning

There are quite a few different kinds of partitioning based on how you want to subdivide your data.

  • Range partitioning is probably the most common and typically used with time or integer series data.
  • List partitioning is also popular, especially if you have a database that is easily separated by some kind of common field - like location or a specific piece of data across your entire set.
  • Hash partitioning is also available, but should only be used when a clearly defined partition pattern cannot be obtained.
  • Composite partitioning would be combining one or more of these, like time based and list partitioning in the same dataset.

Sample partitioning setup with native Postgres

We have loaded sample data set for an IoT thermostat with a table containing these fields:  date time, thermostat_id, current_temperature, and thermostat_status. Check it out:

SELECT * FROM thermostat LIMIT 10;

Create partitioned table

Now lets create a new table that will be partitioned

CREATE TABLE iot_thermostat ( thetime timestamptz, sensor_id int, current_temperature numeric (3,1), thermostat_status text ) PARTITION BY RANGE (thetime);

And then create an index on thetime field

CREATE INDEX ON iot_thermostat(thetime);

Next create individual partitions

CREATE TABLE iot_thermostat07232022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-23 00:00:000') TO ('2022-07-24 00:00:000'); CREATE TABLE iot_thermostat07242022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-24 00:00:000') TO ('2022-07-25 00:00:000'); CREATE TABLE iot_thermostat07252022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-25 00:00:000') TO ('2022-07-26 00:00:000'); CREATE TABLE iot_thermostat07262022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-26 00:00:000') TO ('2022-07-27 00:00:000'); CREATE TABLE iot_thermostat07272022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-27 00:00:000') TO ('2022-07-28 00:00:000'); CREATE TABLE iot_thermostat07282022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-28 00:00:000') TO ('2022-07-29 00:00:000'); CREATE TABLE iot_thermostat07292022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-29 00:00:000') TO ('2022-07-30 00:00:000'); CREATE TABLE iot_thermostat07302022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-30 00:00:000') TO ('2022-07-31 00:00:000'); CREATE TABLE iot_thermosta07312022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-07-31 00:00:000') TO ('2022-08-01 00:00:000'); CREATE TABLE iot_thermostat08012022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-08-01 00:00:000') TO ('2022-08-02 00:00:000'); CREATE TABLE iot_thermostat08022022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-08-02 00:00:000') TO ('2022-08-03 00:00:000'); CREATE TABLE iot_thermostat08032022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-08-03 00:00:000') TO ('2022-08-04 00:00:000');

Insert data into your partition

Now we’ll move data from our original ‘thermostat’ dataset into the iot_thermostat and data will automatically go into the correct partitions

INSERT INTO iot_thermostat SELECT * FROM thermostat;

You just need to insert data once, Postgres will take care of moving data to the correct partitions for you.

Quick check on one of the partitions to make sure you got it all right:

SELECT * FROM iot_thermostat07242022 LIMIT 10;

Rotate partitions

Ok, so let’s say that we only care about recent data , so tomorrow we want to put data from iot_thermostat07232022 in a different table and archive it off. This is done by a DETACH.

ALTER TABLE iot_thermostat DETACH PARTITION iot_thermostat07232022;

And that’s now a standalone table.

We need to make a new one for incoming data as well:

CREATE TABLE iot_thermostat0842022 PARTITION OF iot_thermostat FOR VALUES FROM ('2022-08-04 00:00:000') TO ('2022-08-05 00:00:000');

If you’re doing this daily, you’ll store these in a cron job somewhere so they happen automatically.

Loading terminal...

Loading terminal...