Feb 16, 2020

Utilizing the data layer: Complex UPSERTs in Postgres

In addition to the classic INSERT and UPDATE operations, there's another powerful data-modification technique, called upserting. Combining INSERTs for non-existing records and UPDATEs for existing ones, upserts enable various use cases to ensure a document exists and is modified accordingly, if it was created earlier, all while maintaining a concurrency-safe and atomic behavior.

🚣 Starting off simple

Postgres implements basic upserting by adding an ON CONFLICT clause to INSERT statements, effectively catching constraint violations (like an existing record) and allowing to modify existing records or drop the INSERT altogether. The ON CONFLICT clause allows to reference an existing constraint or index by name, or to get an implicit reference by adding the columns that might create a conflict. We also gain access to the virtual EXCLUDED table that holds the values proposed for insertion, or rather the row that is part of the conflict.

To make everything easier to imagine, let's choose a classic example: We've got a blog and want to allow users to comment on our posts. If a comment doesn't exist, we need to create it and link it to the user that commented, as well as the post it should be attached to. If the comment exists, though, we just want to update its text content.

An example upsert query to create a comment or update its content could look like the following

INSERT INTO "post_comment"(
  "content",
  "post",
  "author"
)
VALUES (
  'Lorem ipsum...', -- content of our comment
  '7b8e2500-2f65-4e60-8a88-33d566defddf', -- link to post
  'd9a92161-0c6e-443a-b665-b2ec43233fed' -- link to user
)
ON CONFLICT ("post", "author")
DO UPDATE SET -- we want to update the existing comment
  -- we take the same content added in the creation values
  content = EXCLUDED."content";

This query allows us to have full confidence in the comment being present afterward: If it does not exist, we'll create it, otherwise the post and author columns, both part of a unique constraint, would kick off the conflict violation flow, allowing us to update the existing comment with updated content.

🎬 Scaling Up

The previous example showed us that the INSERT ... ON CONFLICT syntax can benefit basic upserting flows, where constraints would otherwise prevent us from completing the operation in one statement. Let's assume that we'd want to notify the post author, which internally maps to inserting a comment_created event record to the notification table in case the post got created, or inserting comment_updated when the post did, in fact, exist already.

In my previous post on conditional INSERTs, I set out to perform as many operations as possible in the database, instead of moving this functionality into our business logic layer. We only want to add the notification if our upsert succeeded, that's one reason for running the queries in a transaction. The other, hugely important, reason will become obvious in a bit.

For the notification flow, we'd have the following steps:

  • Check if the comment exists, make sure it's locked if so, we do not want it to be deleted right now
  • Conditionally INSERT the comment, if it doesn't exist
  • Conditionally UPSERT the comment, if it does exist
  • Conditionally INSERT the comment_created notification, if the comment did not exist before
  • Conditionally INSERT the comment_updated notification, if the comment existed before

Let's first start by checking whether the comment exists already:

SELECT
  CASE WHEN COUNT(
    (
      SELECT
        "id" -- can be any column, even something virtual
      FROM "comment"
      WHERE
        "post" = '7b8e2500-2f65-4e60-8a88-33d566defddf' AND
        "author" = 'd9a92161-0c6e-443a-b665-b2ec43233fed'
      FOR UPDATE
    )
  ) < 1
  THEN set_config('blog_app.upsert.comment_exists', 'false', 'true')
  ELSE set_config('blog_app.upsert.comment_exists', 'true', 'true')
END;

This query might look strange at first, but let me explain what we're doing here: The inner-most statement will fetch all comments made by our user, limited to the ones on the current blog post. If the query returns something, the comment row will be locked for all data-modifying operations outside of the current transaction. This prevents the comment from being modified or deleted while we're working on it, and it's a crucial step to guarantee atomicity for our upsert.

In case a comment exists, the count won't be zero and thus we'll execute the set_config command, which modifies a transaction-based configuration Postgres manages and adds a flag that our comment exists. This might not be a conventional method, but since the value is only present throughout the transaction, we won't leak anything, and by naming it specific to our use case, we won't change any Postgres configuration setting.

We set the variable in both cases so we can debug any potential issues and check which result the query returned.

Now, to the conditional queries that upsert a comment and create a notification based on our first SELECT:

-- Insert a post comment conditionally,
-- only when comment does not exist
INSERT INTO "post_comment"(
  "content",
  "post",
  "author"
)
SELECT * FROM (
  VALUES(
    'Lorem ipsum...', -- content of our comment
    '7b8e2500-2f65-4e60-8a88-33d566defddf', -- link to post
    'd9a92161-0c6e-443a-b665-b2ec43233fed' -- link to user
  )
) AS t(text, uuid, uuid)
WHERE current_setting('blog_app.upsert.comment_exists', 'true') = 'false'
-- Update the post comment
-- only if it did not exist before
-- This is important because we don't want to override
-- any INSERTed columns that might be different than UPDATE values
UPDATE "post_comment"
SET "content" = 'Lorem ipsum...'
WHERE
  current_setting('blog_app.upsert.comment_exists', 'true') = 'true' AND
  "post" = '7b8e2500-2f65-4e60-8a88-33d566defddf' AND
  "author" = 'd9a92161-0c6e-443a-b665-b2ec43233fed'
-- Create a notification
-- Determine notification type by
-- temporary comment_exists value
INSERT INTO "notification"(
  "type",
  "user"
)
VALUES(
  -- If post already existed, use comment_updated type
  (
    CASE WHEN
      current_setting('blog_app.upsert.comment_exists', 'true') = 'true'
    THEN
      'comment_updated'
    -- Otherwise, use comment_created type
    ELSE
      'comment_created'
    END
  ),
  'd9a92161-0c6e-443a-b665-b2ec43233fed' -- link to user
);

This concludes all the steps we needed to build. Now, when a user creates a comment on a post, we'll ensure that the comment is created or updated accordingly, as well as create a notification for the post author.


I hope you enjoyed this post, maybe you learned a thing or two about Postgres as well! If you've got any feedback, questions or suggestions, please let me know via Twitter or mail!