A Walk Through PostgreSQL INSERT
Even if you're relatively new to SQL, chances are that you know the INSERT command is used to add new data to a table. For those of you newer to Postgres such as myself, you may not be familiar with a couple of neat tricks you could do with inserts. This post is a refresher on INSERT and also introduces the RETURNING and ON CONFLICT clauses if you haven't used them yet, commonly known as upsert.
Let's start with the basics, we usually see INSERT in two forms. The first explicitly provides the values for each row you want to add, like so:
INSERT INTO event (name, event_dt, mode)
VALUES ('Appointment', tstzrange('2019-12-14 12:00:00 EST', '2019-12-14 12:30:00 EST', '[)'), 'In-person'),
('Workshop', tstzrange('2020-01-11 10:00:00 PST', '2020-01-11 12:00:00 PST', '[)'), 'Virtual');
The other takes the form of a subquery, where column values are retrieved from other tables using SELECT:
INSERT INTO event (client_id, event_id, attend_status)
VALUES (2, (SELECT id FROM event WHERE lower(event_dt) = '2020-01-11 17:00'::timestamptz), 'Canceled');
The first statement above is also an example of a multi row insert. This is more performant than multiple single row inserts, and can come closer to approaching COPY speeds. COPY is Postgres' mechanism for bulk uploading data, such as from a CSV file.
I've since learned there are at least a couple other clauses you could include with INSERT statements if you need. If you're not aware, a clause is a "chunk" of a SQL statement that you can add on to achieve specific behavior. With INSERT, you could have your database return values based on an inserted row. Also, if the insert operation can't proceed because of a conflict, you could have the database perform a different action instead of the usual raise error.
Retrieve values with the RETURNING clause
By default, a successful insert will display a message (otherwise known as the command tag) that indicates the number of rows inserted - the command tag below indicates that one row was inserted:
INSERT 0 1
The RETURNING clause allows you to get back some actual values instead of just seeing this message. This comes in really handy when, for example, you've inserted a row where one of the columns is autogenerated (e.g. serial and identity columns). You wouldn't include the autogenerated column in your insert, and you may not even know beforehand what the expected value is.
With RETURNING, you could ask the database to immediately retrieve that value for you. The following is an example for a client table that stores auto-generated values for id:
INSERT INTO client (first, last, email, career_interests)
VALUES ('Jermayne', 'Frankum', 'jfrankum7@latimes.co', '{"Developer", "Engineering Manager"}')
RETURNING (id);
This effectively lets you skip that additional SELECT statement to grab that id value you need. Cool!
Alternative action for insert conflicts with ON CONFLICT
DO NOTHING
There are two paths you can take with the ON CONFLICT clause.
The first is to tell Postgres to do nothing when a conflict blocks the insert operation. For example, let's say I'm tracking event attendance, and I want to add data per individual (client) attending a particular event.
CREATE TABLE attendance (
event_id INTEGER REFERENCES event(id),
client_id INTEGER REFERENCES client(id),
attend_status TEXT NOT NULL CHECK (attend_status in ('Registered', 'Attended', 'Canceled', 'No Show')) DEFAULT 'Registered',
PRIMARY KEY (event_id, client_id)
);
I've chosen my primary key for this table to be a composite of the event's ID and the person or client's ID.
If I try to do an insert and that person's attendance is already on the table, the insert runs into a conflict, and I get the following error:
ERROR: duplicate key value violates unique constraint "attendance_pkey" DETAIL: Key (event_id, client_id)=(4, 3) already exists.
But if I want the operation to just fail "silently" and move right on if that attendance has already been tracked, I can add ON CONFLICT DO NOTHING with my INSERT statement:
INSERT INTO attendance (event_id, client_id, attend_status)
VALUES ((SELECT id FROM event WHERE lower(event_dt) = '2020-01-11 17:00'::timestamptz), 3, 'No Show')
ON CONFLICT ON CONSTRAINT attendance_pkey
DO NOTHING;
The INSERT statement above just specifies the attendance_pkey constraint to be the basis for whether or not DO NOTHING should take effect. If that constraint is violated, instead of an error, Postgres just skips the insert and displays the usual command tag:
INSERT 0 0
It's also possible to use ON CONFLICT DO NOTHING by itself (that is, without adding the "conflict target" such as ON CONSTRAINT attendance_pkey in the above example).
DO UPDATE (Upsert)
The other option with ON CONFLICT is to specify DO UPDATE, which updates the existing row that conflicts with the insert. In other database systems this is usually referred to as an "upsert":
INSERT INTO attendance (event_id, client_id, attend_status)
VALUES ((SELECT id FROM event WHERE lower(event_dt) = '2020-01-11 17:00'::timestamptz), 3, 'No Show')
ON CONFLICT ON CONSTRAINT attendance_pkey
DO UPDATE SET attend_status = excluded.attend_status;
Following the earlier scenario, suppose that a person having attended a particular event has already been recorded, and I say, "Actually, this new attendance data should take precedence because it's probably more correct, and it should just replace the pre-existing info in the database."
So, in the example above, if a row already exists in the attendance table with the primary key of interest, instead of raising an error, Postgres takes the existing row's attend_status value and updates it with the attend_status value you attempted to insert. (You can reference the excluded pseudo-table for these values. It's definitely handy, although there doesn't appear to be much documentation on this table, and it's often just described as a special table in Postgres.)
Unlike DO NOTHING, for upserts you must specify a conflict target. This lets you be more precise on what should trigger the update as well as what kind of update should happen. For example, say in the attendance table I had another column with a UNIQUE constraint, like event registration codes. In this case, I wouldn't still want a blanket update on attendance_status for that particular constraint violation - I would choose to update the registration code column instead, and that becomes the conflict target for my DO UPDATE clause.
I would highly recommend checking out Depesz' blog for a deep dive on upserts. He also does a performance comparison between ON CONFLICT versus an upsert carried out via a user-defined function.
Try out RETURNING and ON CONFLICT in our Learning Portal
If you'd like to get your feet wet with RETURNING and ON CONFLICT, our Crunchy Data Learning Portal has a new scenario on the INSERT command where we've started up a few small tables (the same ones used in the examples here, in fact) that you can play around with.
I should also mention that Postgres has the COPY command which lets you import data from a file. This is particularly helpful for when you have a large amount of data that you need to get into Postgres. Don't miss our post on Fast CSV and JSON Ingestion in PostgreSQL with COPY, also right here on our Crunchy blog.
Related Articles
- Postgres Partitioning with a Default Partition
16 min read
- Iceberg ahead! Analyzing Shipping Data in Postgres
8 min read
- PostGIS Day 2024 Summary
8 min read
- Crunchy Data Warehouse: Postgres with Iceberg for High Performance Analytics
8 min read
- Loading the World! OpenStreetMap Import In Under 4 Hours
6 min read