Announcing Crunchy Bridge for Analytics—interact with your data lake using Postgres commands via extension, with a vectorized, parallel query engine. Learn more in our announcement.

Tutorial Instructions

Custom data types : user defined types

Why create a custom data type?

Custom data type will help you mostly for data integrity, ensuring your data stored in the database is as you ever wanted it to be. A common surprise gift relies also in the maintenance of it.

With a simple story in this tutorial, you’ll understand how it can help you benefit from those advantages. This tutorial is divided in two sub-tutorials:

  • part 1 covers DOMAIN usage
  • this part 2 covers (your own) TYPE usage

The CREATE TYPE command

If you read part 1 of this tutorial, you can understand that CREATE TYPE command is a bit more complex. Covering all subtleties of that command in a tutorial would result in a very, very big one. Look at CREATE TYPE definition:

Command:     CREATE TYPE
Description: define a new data type
Syntax:
CREATE TYPE name AS
    ( [ attribute_name data_type [ COLLATE collation ] [, ... ] ] )

CREATE TYPE name AS ENUM
    ( [ 'label' [, ... ] ] )

CREATE TYPE name AS RANGE (
    SUBTYPE = subtype
    [ , SUBTYPE_OPCLASS = subtype_operator_class ]
    [ , COLLATION = collation ]
    [ , CANONICAL = canonical_function ]
    [ , SUBTYPE_DIFF = subtype_diff_function ]
    [ , MULTIRANGE_TYPE_NAME = multirange_type_name ]
)

CREATE TYPE name (
    INPUT = input_function,
    OUTPUT = output_function
    [ , RECEIVE = receive_function ]
    [ , SEND = send_function ]
    [ , TYPMOD_IN = type_modifier_input_function ]
    [ , TYPMOD_OUT = type_modifier_output_function ]
    [ , ANALYZE = analyze_function ]
    [ , SUBSCRIPT = subscript_function ]
    [ , INTERNALLENGTH = { internallength | VARIABLE } ]
    [ , PASSEDBYVALUE ]
    [ , ALIGNMENT = alignment ]
    [ , STORAGE = storage ]
    [ , LIKE = like_type ]
    [ , CATEGORY = category ]
    [ , PREFERRED = preferred ]
    [ , DEFAULT = default ]
    [ , ELEMENT = element ]
    [ , DELIMITER = delimiter ]
    [ , COLLATABLE = collatable ]
)

Related documentation page : https://www.postgresql.org/docs/current/sql-createtype.html

Four forms

User defined types can be of 4 forms:

  1. composite type, specified by a list of attribute names and data types
  2. enumerated type, composed of a list of quoted labels
  3. range type, to create a versatile range
  4. a completely new scalar base type that you would need, with all features you need to handle correctly in the database.

The 4th form here won’t be covered in this tutorial, because it should be a completely new tutorial, because it’s vast and complex. It has to be created as a superuser too, “because erroneous type definition could confuse or even crash the server”, as per the PostgreSQL’s documentation.

Composite type

Let’s say your applications manages a delivery system, where letters or packages are delivered somewhere. This application will mostly use the ID of the package most of the time, like, anywhere in the application. But the characteristics of the letter or package itself as dimensions and weight aren’t much used, except to be passed to functions that will determine the type of the package.

Let’s say anything that is bigger than 10 x 13 (inches or whatever unit) and heavier than 18 (ounces or whatever unit) is categorized as a box, and any other thing is categorized as a letter.

We’ll create a new data type, as package with 2 dimensions and a weight, and the table packages using this new composite data type. Just after we create the new table packages using the new data type created:

create type physical_package as ( height numeric , width numeric , weight numeric ); create table packages ( id bigint generated always as identity primary key , properties physical_package );

We can now cast proper formatted data into physical_package data type using the “::” operator (or cast() function, if you want). This is useful for many purposes, among them, inserting values in a readable manner:

insert into packages ( properties ) values ( '(10.3,4.0,0.5)'::physical_package ), ( '(5,3.0,0.2)'::physical_package ), ( '(100,200,400)'::physical_package ), ( '(4,10,50)'::physical_package ), ( '(12,10,100)'::physical_package ), ( '(3.5,5,3.5)'::physical_package );

To access the “columns” of a composite data type, you can’t simply select like my_tape_name.my_sub_type because it will confuse PostgreSQL’s parser, like my_tape_name is actually considered as a table name… The type name must be then set into parenthesis like this in this example, to avoid that error:

select id,(properties).weight from packages;

One of the great benefits of creating a composite type is that it can be passed to a function as a parameter, or the function can return it too. Let’s create a function to categorize the package, whether it’s a letter or a box:

create function categorize_package ( p physical_package ) returns text as $$ select case when ( case when (p).height>10.0 then true else false end or case when (p).width >13.0 then true else false end or case when (p).weight>18.0 then true else false end) then 'box' else 'letter' end ; $$ language sql;

See how we use the newly created type to simplify the parameter list of the function.

This SQL functions uses a trick: it’s a logical comparison of each of the 3 “case when” lines, it returns true if at least one of the 3 “case when” is true. When that happens, the upper case when is true, meaning it’s a box, otherwise, it’s a letter.

We can now categorize each package simply thanks to this function in a single query:

select id, properties, categorize_package(properties) from packages;

We can also use it to do some math with an aggregate:

select categorize_package(properties), count(*) from packages group by 1;

It can be used also to find the letters among the packages, thus, in the WHERE clause:

select id, properties from packages where categorize_package(properties)='letter';

Enumerated type

When you want to emulate the enumerated type (often named as enum in many programming languages), with PostgreSQL, you have to create it with a CREATE TYPE. It’ can’t be created with a domain with some default values, as an example.

Let’s create a package category like enum, each has to be a box or a letter:

create type package_cat as enum ('box','letter');

Now we can modify the categorize_package() function. To avoid problems, we’ll have to drop it first, since you may have created it on the previous stage of this tutorial:

drop function categorize_package; create function categorize_package ( p physical_package ) returns package_cat as $$ select case when ( case when (p).height>10.0 then true else false end or case when (p).width>13.0 then true else false end or case when (p).weight>18.0 then true else false end ) then 'box'::package_cat else 'letter'::package_cat end ; $$ language sql;

In this example, you have to notice two things:

  • a function can return a type we created, not only basic data types PostgreSQL offers
  • we did a cast on ‘box’ and ‘letter’ output of the function to package_cat data type, otherwise, the function is not valid, since we mentioned in its header it “returns package_cat”

All this looks like overkill you may think. It just allows one to have the data as it wants it to be, in any circumstances. In the application we can now think about a new category ‘postcard’ that will have very special physical characteristics. We then just have to add that value to the ‘enum’ that package_cat is:

alter type package_cat add value 'postcard';

Note that, sadly, since PostgreSQL allows us to add values to any ‘enum’, we cannot remove (a) value(s) from it. If you need that you’ll have to:

  1. rename the existing ‘enum’ to some my_enum_old
  2. create the new type with the values you want to keep eg my_enum
  3. update the columns to use that newly created type, with an ALTER TABLE… ALTER COLUMN… TYPE my_enum
  4. drop the old my_enum

To see all elements of that ‘enum’ a simple way is to issue a ‘\dT+’ command and look at the column “Elements” in there:

\dT+ package_cat

One would now have to adapt the function categorize_package(), with postcard’s characteristics to categorize it as a postcard, it’s not needed to go further in this tutorial.

We can now create another table with that new column of type ‘package_cat’ added, and we not tolerate any further any null values

create table packages_with_category ( id bigint generated always as identity primary key , properties physical_package not null , category package_cat not null );

Let’s insert data on it:

insert into packages_with_category ( properties ,category ) values ( '(10.3,4.0,0.5)'::physical_package ,'box'::package_cat ), ( '(5,3.0,0.2)'::physical_package ,'letter'::package_cat ), ( '(100,200,400)'::physical_package ,'box'::package_cat ), ( '(4,10,50)'::physical_package ,'box'::package_cat ), ( '(12,10,100)'::physical_package ,'box'::package_cat ), ( '(3.5,5,3.5)'::physical_package ,'postcard'::package_cat );

As expected, any attempt to insert something wrong will fail:

insert into packages_with_category ( properties ,category ) values ( '(6,6,6)'::physical_package ,'stuff'::package_cat );

Again, like domains, this kind of custom data type helps storing data as it is expected to be, ensuring the quality of your data.

Range type

Range type is a common data type for some time now in PostgreSQL. But the built-in subtypes of the range data type are limited to, basically, integers, big integers, numerics, timestamps with or without time zone and dates.

So the common usage of creating a new type of range, is to have it handle other subtypes.

Let’s say that for each package, the customer accepts some price discounts when the acceptable delay of delivering changes. Like if the transport company does it’s best, it can be delivered in 3 days, but also up to 10 days. We could then define a range of intervals. Let’s create that new range type, composed of intervals:

create type delay as range ( subtype = interval );

Now we create a new table, will previous custom data types, and this this new one:

create table packages_with_delay ( id bigint generated always as identity primary key , properties physical_package not null , category package_cat not null , acceptable_delay delay not null );

Now we insert the data:

insert into packages_with_delay ( properties ,category ,acceptable_delay ) values ( '(10.3,4.0,0.5)'::physical_package ,'box'::package_cat ,'[3 hours,3 days]'::delay ), ( '(5,3.0,0.2)'::physical_package ,'letter'::package_cat ,'[3 days, 10 days]' ), ( '(100,200,400)'::physical_package ,'box'::package_cat ,'[5 days, 30 days]' ), ( '(4,10,50)'::physical_package ,'box'::package_cat ,'[1 day, 10 days]' ), ( '(12,10,100)'::physical_package ,'box'::package_cat ,'[3 hours, 2 days]' ), ( '(3.5,5,3.5)'::physical_package ,'postcard'::package_cat ,'[3 days, 1 month]' );

As an usage example, a new shipment has to be sent today, at it should arrives in 1 or 2 days. What are the relevant packages with should send with this shipment today?

select id ,acceptable_delay from packages_with_delay where acceptable_delay @> '[1 day,2 day]'::delay;

There’s also a boat we’re loading somewhere, and depending the sea conditions, the arrival will be in like 20 days to 1 month, which packages should go to the boat ?

select id ,acceptable_delay from packages_with_delay where acceptable_delay @> '[20 days,1 month]'::delay;

Speaking of ranges types, beware about not confusing “overlapping” and “contains” operators. Overlapping is ‘&&’ as in:

select id ,acceptable_delay from packages_with_delay where acceptable_delay && '[6 days,1 month]'::delay;

While “contains” is ‘@>’ (or ‘<@’, depending how you order the comparison’s elements):

select id ,acceptable_delay from packages_with_delay where acceptable_delay @> '[6 days,1 month]'::delay;

Compare those two results carefully. Look at the range operators table in the PostgreSQL documentation for more information.

Conclusion

In this blog post from Craig Kerstiens, the conclusion is quite fun, and is written before the actual conclusion :

In my humble opinion, enumerated types will help you having the best quality of data. We’d recommend tough not using that this much, most over, with long list of values. Let’s say if you need an enum of more than 10 values, you’d rather instead use a table to store them, and apply some reference (integrity) between the source table and the newly created one.

Custom composite types may be really handy if you deal with a lot of functions, or when you practice some kind of denormalization of your data. Sometimes it can be seen as the “previous version” of tuples stored in actual table. Others to that with some tablename_hist to keep track of changes, etc.

Custom range types usages may be probably one of less usages, since the built-in subtypes of the range data type will surely cover all your needs.

The rise of JSON(B) in PostgreSQL did, at some point, change the interest for composite types and arrays, that PostgreSQL support for long, allowing new usages.

Loading terminal...

Loading terminal...