Tutorial Instructions

Custom data types: DOMAINS

Why create a custom data type?

Custom data type will help you mostly for data integrity, ensuring your data stored in the database is as you ever wanted it to be. A common surprise gift relies also in the maintenance of it.

With a simple story in this tutorial, you’ll understand how it can help you benefit from those advantages. This tutorial is divided in two sub-tutorials:

  • this part 1 covers DOMAIN usage
  • part 2 covers (your own) TYPE usage

Domain usage

Defining your own domains will be of great usage to prevent insertion of unwanted formatted data or data not corresponding exactly to what you want.

Command: CREATE DOMAIN
Description: define a new domain
Syntax:
CREATE DOMAIN name [ AS ] data_type
	[ COLLATE collation ]
	[ DEFAULT expression ]
	[ constraint [ ... ] ]

where constraint is:

Related documentation page: https://www.postgresql.org/docs/current/sql-createdomain.html

An example worth millions words

Let’s say you have a table person in your database defined like:

test=# \d person
                              Table "public.person"
   Column   |  Type   | Collation | Nullable |              Default               
------------+---------+-----------+----------+------------------------------------
 id         | integer |           | not null | nextval('person_id_seq'::regclass)
 firstname  | text    |           | not null | 
 lastname   | text    |           | not null | 
 birth_date | date    |           |          | 
 email      | text    |           | not null | 
Indexes:
    "person_pkey" PRIMARY KEY, btree (id)create table person

You notice that the birth_date can be null, or anything that represents a date, even a date in the 1st century.

You notice also that email is a plain text, any text value could fit well here. On production systems, we see that very often, because CHECKS for data are done inside the application, hoping it does that well for us. A database like PostgreSQL has everything to handle things well, like constraints checking, so why not use this instead? It will allow changing the application from one language to another, first. Second, it will allow usage of the database without going thru the application. In this later case, check constraints are here to prevent any unwanted entry.

Let’s say in your application you to insert or update data only of people born after 1920, likely, living persons, and also, you want to be sure that emails stored are likely to be valid.

One can easily correct this with some checks in the create table. We could have ALTER-ed the table, but for the clarity of this tutorial, we create a new table instead, as we will do another one after this:

create table person_using_checks ( id integer generated always as identity primary key , firstname text not null , lastname text not null , birth_date date , email text not null , check (birth_date>'1930-01-01'::date) , check (email ~ '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$') );

Note that on that example we use the new GENERATED { ALWAYS | BY DEFAULT } AS IDENTITY method to create a surrogate key (see https://www.postgresql.org/docs/15/sql-createtable.htm for more details).

Once the table is created, try to insert this data:

insert into person_using_checks (firstname,lastname,birth_date,email) values ('Jhon','Doe','1970-01-01'::date,'john@doe.org');
insert into person_using_checks (firstname,lastname,birth_date,email) values ('Georges','Washington','1732-02-22'::date,'georges@whitehouse.gov.us');
insert into person_using_checks (firstname,lastname,birth_date,email) values ('Starman','Sky','1972-04-28','unknown');

On the last 2 examples, PostgreSQL tells you what’s wrong in there…

You could let that as it is if you have only that table to manage with this kind of data. But, if you want to apply those checks to other tables, then the best way is to create domains in PostgreSQL, so you can apply that domain definition to any table you want where the data has to be the same in your eyes.

Let’s create those two domains instead:

create domain date_of_birth as date check (value > '1930-01-01'::date) ; create domain valid_email as text not null check (value ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$') ;

Notice that the date_of_birth domain still allows null values, but valid_email doesn’t.

You can list your domains in psql with this command:

\dD

Once those domains are created, we’ll create a new table using the domains, without the need for extra checks, since those are now done thru the domains:

create table person_using_domains ( id integer generated always as identity primary key , firstname text not null , lastname text not null , birth_date date_of_birth , email valid_email );

Try to insert the same data as before, in this new table, and look what happens:

insert into person_using_domains (firstname,lastname,birth_date,email) values ('Jhon','Doe','1970-01-01'::date,'john@doe.org');
insert into person_using_domains (firstname,lastname,birth_date,email) values ('Georges','Washington','1732-02-22'::date,'georges@whitehouse.gov.us');
insert into person_using_domains (firstname,lastname,birth_date,email) values ('Starman','Sky','1972-04-28','unknown');

PostgreSQL still doesn’t want that data, but the message changed a bit, it refers now to the domains’ constraints.

Let’s say the application has now evolved, and we only want to store persons born after 1980.

You’ll have first to drop the constraint of the domain and create a new one, because, at least at the moment, PostgreSQL doesn’t allows to modify the check constraints on the fly of a domain (see https://www.postgresql.org/docs/current/sql-alterdomain.html for complete ALTER DOMAIN command):

alter domain date_of_birth drop constraint date_of_birth_check ;

If you look at the domain definition in psql, you’ll see there are no more Check listed:

\dD date_of_birth

Let’s recreate it now:

alter domain date_of_birth add check (value > '1980-01-01'::date);

PostgreSQL will refuse, because that John Doe we inserted was born on epoch 0, i.e. 01/01/1970.. and then, this record doesn’t match date_of_birth domain requirements.

We can enforce this with the “not valid” in that alter domain, meaning by PostgreSQL’s documentation : “Do not verify existing stored data for constraint validity”. It’s an enforcement, to let data not satisfying the check of the data live inside our database:

alter domain date_of_birth add check (value > '1980-01-01'::date) not valid;

By doing this, now we can ask PostgreSQL to check if the check of that particular domain is respected or not:

alter domain date_of_birth validate constraint date_of_birth_check ;

It won’t tell which data is violating that “new constraint”. You’ll have to figure it out by yourself, selecting all the data not matching (where birth_date < ‘1980-01-01’::data …) and then doing what’s necessary to tidy up your database, as an example, creating a person_history table with same columns and moving this data to it, or just deleting it, depending on what you really want to do.

Domain should be defined when you’re building your schema. Think well on each column, what kind of domain could it be, what check constraints you want, if it’s null-able or not, if a default value makes sense or not, etc.

Too often we see domains are not used, and that’s bad because on any modification of a particular type of data, let’s say zipcode, one will have to modify any column of the table, one by one, and that’s prone to errors and forgetting that this particular column in that particular table, had to be modified too, and one just forgot it… makes it even worse if you have CHECKs in the table’s definition like to validate that a given zipcode type column is valid: you’ll have to modify each one of those too !

Nowadays, most of tools to create schemas and maintain it will allow you to define your own domains and manage inside the tool, that will then most likely build a full SQL script for you to create domains, tables, index, etc… Use it ! Your schema readability and maintainability will be really enhanced greatly.

Loading terminal...

Loading terminal...