In addition to the classic INSERT
and UPDATE
operations, there's another powerful data-modification technique,
called upserting. Combining INSERTs for non-existing records and UPDATEs for existing ones, upserts enable various
use cases to ensure a document exists and is modified accordingly, if it was created earlier, all while maintaining
a concurrency-safe and atomic behavior.
🚣 Starting off simple
Postgres implements basic upserting by adding an ON CONFLICT
clause to INSERT
statements, effectively
catching constraint violations (like an existing record) and allowing to modify existing records or drop
the INSERT altogether. The ON CONFLICT
clause allows to reference an existing constraint or index by
name, or to get an implicit reference by adding the columns that might create a conflict. We also gain
access to the virtual EXCLUDED
table that holds the values proposed for insertion, or rather the
row that is part of the conflict.
To make everything easier to imagine, let's choose a classic example: We've got a blog and want to allow users to comment on our posts. If a comment doesn't exist, we need to create it and link it to the user that commented, as well as the post it should be attached to. If the comment exists, though, we just want to update its text content.
An example upsert query to create a comment or update its content could look like the following
INSERT INTO "post_comment"(
"content",
"post",
"author"
)
VALUES (
'Lorem ipsum...', -- content of our comment
'7b8e2500-2f65-4e60-8a88-33d566defddf', -- link to post
'd9a92161-0c6e-443a-b665-b2ec43233fed' -- link to user
)
ON CONFLICT ("post", "author")
DO UPDATE SET -- we want to update the existing comment
-- we take the same content added in the creation values
content = EXCLUDED."content";
This query allows us to have full confidence in the comment being present afterward: If it does not
exist, we'll create it, otherwise the post
and author
columns, both part of a unique constraint,
would kick off the conflict violation flow, allowing us to update the existing comment with updated
content.
🎬 Scaling Up
The previous example showed us that the INSERT ... ON CONFLICT
syntax can benefit basic upserting flows,
where constraints would otherwise prevent us from completing the operation in one statement. Let's assume
that we'd want to notify the post author, which internally maps to inserting a comment_created
event record
to the notification
table in case the post got created, or inserting comment_updated
when the post did, in fact,
exist already.
In my previous post on conditional INSERTs
, I set out to
perform as many operations as possible in the database, instead of moving this functionality into our business logic layer.
We only want to add the notification if our upsert succeeded, that's one reason for running the queries in a transaction.
The other, hugely important, reason will become obvious in a bit.
For the notification flow, we'd have the following steps:
- Check if the comment exists, make sure it's locked if so, we do not want it to be deleted right now
- Conditionally
INSERT
the comment, if it doesn't exist - Conditionally
UPSERT
the comment, if it does exist - Conditionally
INSERT
thecomment_created
notification, if the comment did not exist before - Conditionally
INSERT
thecomment_updated
notification, if the comment existed before
Let's first start by checking whether the comment exists already:
SELECT
CASE WHEN COUNT(
(
SELECT
"id" -- can be any column, even something virtual
FROM "comment"
WHERE
"post" = '7b8e2500-2f65-4e60-8a88-33d566defddf' AND
"author" = 'd9a92161-0c6e-443a-b665-b2ec43233fed'
FOR UPDATE
)
) < 1
THEN set_config('blog_app.upsert.comment_exists', 'false', 'true')
ELSE set_config('blog_app.upsert.comment_exists', 'true', 'true')
END;
This query might look strange at first, but let me explain what we're doing here: The inner-most statement will fetch all comments made by our user, limited to the ones on the current blog post. If the query returns something, the comment row will be locked for all data-modifying operations outside of the current transaction. This prevents the comment from being modified or deleted while we're working on it, and it's a crucial step to guarantee atomicity for our upsert.
In case a comment exists, the count won't be zero and thus we'll execute the set_config
command, which modifies a transaction-based configuration Postgres manages and adds
a flag that our comment exists. This might not be a conventional method, but since the value
is only present throughout the transaction, we won't leak anything, and by naming it specific
to our use case, we won't change any Postgres configuration setting.
We set the variable in both cases so we can debug any potential issues and check which result the query returned.
Now, to the conditional queries that upsert a comment and create a notification based on
our first SELECT
:
-- Insert a post comment conditionally,
-- only when comment does not exist
INSERT INTO "post_comment"(
"content",
"post",
"author"
)
SELECT * FROM (
VALUES(
'Lorem ipsum...', -- content of our comment
'7b8e2500-2f65-4e60-8a88-33d566defddf', -- link to post
'd9a92161-0c6e-443a-b665-b2ec43233fed' -- link to user
)
) AS t(text, uuid, uuid)
WHERE current_setting('blog_app.upsert.comment_exists', 'true') = 'false'
-- Update the post comment
-- only if it did not exist before
-- This is important because we don't want to override
-- any INSERTed columns that might be different than UPDATE values
UPDATE "post_comment"
SET "content" = 'Lorem ipsum...'
WHERE
current_setting('blog_app.upsert.comment_exists', 'true') = 'true' AND
"post" = '7b8e2500-2f65-4e60-8a88-33d566defddf' AND
"author" = 'd9a92161-0c6e-443a-b665-b2ec43233fed'
-- Create a notification
-- Determine notification type by
-- temporary comment_exists value
INSERT INTO "notification"(
"type",
"user"
)
VALUES(
-- If post already existed, use comment_updated type
(
CASE WHEN
current_setting('blog_app.upsert.comment_exists', 'true') = 'true'
THEN
'comment_updated'
-- Otherwise, use comment_created type
ELSE
'comment_created'
END
),
'd9a92161-0c6e-443a-b665-b2ec43233fed' -- link to user
);
This concludes all the steps we needed to build. Now, when a user creates a comment on a post, we'll ensure that the comment is created or updated accordingly, as well as create a notification for the post author.
I hope you enjoyed this post, maybe you learned a thing or two about Postgres as well! If you've got any feedback, questions or suggestions, please let me know via Twitter or mail!