May 22, 2022

Modeling Polymorphic Relations in Postgres

Imagine you’re about to build a site builder with multiple content blocks. Users can add paragraphs, callouts, links, and whatever else you can imagine. You wonder how you could model this in your database, as sites should be linked to all connected blocks, but you do not want to create one wide table for your content blocks, instead, you would like distinct tables for each block kind, making it easy to evolve content blocks separately.

Now, you might already think, this sounds an awful lot like polymorphic relations, or unions, depending on your stack. If not, don’t worry, we’ll learn about those in a second.

Traditionally, relations are modeled between two entities. The classic example of students and courses, posts and authors, books and libraries, you name it. These relations all connect two tables in your data model, sometimes via embedded foreign keys, sometimes via a separate mapping table.

But sometimes, you need to add not one, but potentially multiple, reverse sides to a relation. This is referred to as a polymorphic relation or association. This is exactly what we need, as we want distinct tables for each content block, evolving the supported blocks over time.

So let’s move ahead and check out how we can make this happen

Content tables

First, we need to add some tables, one for sites, and then one table per content block. For this guide, I went with three simple blocks, but you can create as many as you like.

create table "site" (
  "id" varchar(64) not null,
  "title" varchar(64) not null,

  constraint "site_pkey" primary key ("id")
);

create table "paragraph" (
  "id" varchar(64) not null,
  "content" text not null,

  constraint "paragraph_pkey" primary key("id")
);

create table "link" (
  "id" varchar(64) not null,
  "content" text not null,
  "href" varchar(512) not null,

  constraint "link_pkey" primary key("id")
);

create table "callout" (
  "id" varchar(64) not null,
  "content" text not null,

  constraint "callout_pkey" primary key("id")
);

Nothing special so far, just plain old tables. If you’re curious why I used varchar(64) for primary keys/identifiers, check out the related guide.

Modeling polymorphic relations

Now, to the interesting part. We’ll create a mapping table that connects our site table to the content blocks. I’ll show you the table creation first, then elaborate on the design decisions that lead to it.

create table "site_block" (
  "site" varchar(64) not null,

  "block" varchar (64) not null generated always as (coalesce("paragraph", "link", "callout")) stored,

  "position" bigint not null,

  "paragraph" varchar(64),
  "link" varchar(64),
  "callout" varchar(64),

  constraint "site_block_pkey" primary key ("site", "block"),

  constraint "site_block_site_fkey" foreign key ("site") references "site" ("id") on delete cascade,

  constraint "site_block_paragraph_fkey" foreign key ("paragraph") references "paragraph" ("id") on delete cascade,
  constraint "site_block_link_fkey" foreign key ("link") references "link" ("id") on delete cascade,
  constraint "site_block_callout_fkey" foreign key ("callout") references "callout" ("id") on delete cascade,
  constraint "site_block_position_key" unique ("site", "position")
);

As you can see, we have a compound primary key made up of the site and block IDs. But now it gets interesting: For each content block, we add an optional ID column with a foreign key to the table the block data is stored in. To create a lean primary key, we then select the first block ID from the content blocks using a generated column that is read-only and populated on write.

This little trick allows us to add as many content blocks as we want, we merely have to add another column and update the generated column. If you don’t want to do the latter or your database system does not support generated columns yet, you can manually write the block column in your insert queries.

We store a column for each content block to achieve referential integrity: Whenever the site is deleted, the link is dropped, too. Content blocks may be re-used across sites, so we do not cascade to the blocks when we drop a site.

Another important rule is that only one of the content block columns may be set per row, which you could enforce with a check constraint.

To summarize: We create a new ID column and foreign key for every content block and select the block ID for our primary key using a generated column. Let’s insert and fetch some data to understand how we can work with this data model.

insert into "site" ("id", "title") values ('29Vq2cDSdPiu8J8GSxIPMeuvWek', 'Building Polymorphic Relations in Postgres');

insert into "paragraph" ("id", "content") values ('29Vq3MWcpDxUXIHidPq98Q4Noh8', '...');

insert into "callout" ("id", "content") values ('29VsIxqSLlqwe2DFdvKXfr6fFj0', 'Some warning here');

insert into "link" ("id", "content", "href") values ('29VsIwUYqabT4iPxLu48NQsMYjT', 'Check out Anzu', 'https://anzuhq.com');

insert into "site_block" ("site", "position", "paragraph")
  values (
    '29Vq2cDSdPiu8J8GSxIPMeuvWek',
    coalesce(
      (select max("position") from "site_block" where "site" = '29Vq2cDSdPiu8J8GSxIPMeuvWek'),
      -1
    ) + 1,
    '29Vq3MWcpDxUXIHidPq98Q4Noh8'
  );

insert into "site_block" ("site", "position", "callout")
  values (
    '29Vq2cDSdPiu8J8GSxIPMeuvWek',
    coalesce(
      (select max("position") from "site_block" where "site" = '29Vq2cDSdPiu8J8GSxIPMeuvWek'),
      -1
    ) + 1,
    '29VsIxqSLlqwe2DFdvKXfr6fFj0'
  );

insert into "site_block" ("site", "position", "link")
  values (
    '29Vq2cDSdPiu8J8GSxIPMeuvWek',
    coalesce(
      (select max("position") from "site_block" where "site" = '29Vq2cDSdPiu8J8GSxIPMeuvWek'),
      -1
    ) + 1,
    '29VsIwUYqabT4iPxLu48NQsMYjT'
  );

As you can see from the last three statements, we only have to supply the site, position, and content block ID from the block we want to link to our site. We do not have to supply the block column, as it will be generated in the background. Unfortunately, we have to supply the position, as we cannot run subqueries in generated column expression.

The whole point of the coalesce(select(max) …, 0) + 1 expression is to retrieve the next position for a content block, effectively appending it to the bottom of the site.

With some data in our system, let’s query it! Imagine a GraphQL API using a union type for content blocks, where we want to return a JSON object with a type discriminator field and details about the respective content blocks.

Using a CTE (or with statement), we can first load all blocks for a given site ordered by their position, then use the union set operation to chain together concrete queries for each content type. This is done to avoid wide joins and creates distinct selects for each content type. We then merge all fields into one JSON object so union all receives the same columns from each statement.

with "blocks" as (
  select * from "site_block" where "site" = '29Vq2cDSdPiu8J8GSxIPMeuvWek' order by "position" asc
)
select json_build_object('id', "id", 'content', "content", 'kind', 'paragraph') as "data"
  from "blocks", "paragraph" where "id" = "blocks"."block" union all
select json_build_object('id', "id", 'content', "content", 'kind', 'callout') as "data"
  from "blocks", "callout" where "id" = "blocks"."block" union all
select json_build_object('id', "id", 'content', "content", 'href', "href", 'kind', 'link') as "data"
  from "blocks", "link" where "id" = "blocks"."block"
;

If you don’t understand the query above, try playing around with it a bit and it should become clear. Running it yields

[
  {
    data: '{"id" : "29Vq3MWcpDxUXIHidPq98Q4Noh8", "content" : "...", "kind" : "paragraph"}'
  },
  {
    data: '{"id" : "29VsIxqSLlqwe2DFdvKXfr6fFj0", "content" : "Some warning here", "kind" : "callout"}'
  },
  {
    data: '{"id" : "29VsIwUYqabT4iPxLu48NQsMYjT", "content" : "Check out Anzu", "href" : "https://anzuhq.com", "kind" : "link"}'
  }
];

As you can see, we have received three rows with distinct kind fields.

To summarize,

  • we have created a polymorphic relation for our three content blocks
  • we have stored content block data in separate tables that can evolve over time
  • we have fetched data using union all to create distinct statements that may evolve over time