Sep 26, 2022

Building Sortable Relations with PostgreSQL

Relational databases have been at the foundation of the internet for the past four decades. What was built for business workflows in the past is now used for building dynamic applications powered by user-generated data.

Sortable relations are one of the most important features in database modeling: Whether we’re talking about playlists, queues, preferences, or other content, sortable lists are everywhere. And with relational databases, you store links to both sides of the relation together with a position, but how does that really work?

In the following, we’ll dive deep into modeling and implementing sortable relations in PostgreSQL. Once you’re done reading, you will be able to build sortable relations for your application, using your stack of choice.

A primer on sortable relations

Imagine you’re building a store and want to store a wish list for every customer. More precisely, you want to store an arbitrary number of items for each customer, where the order is a proxy for the priority.

In a relational database, you would now store two tables with content: One for items, and another one for customers. In the second step, you would add a table that links both the item and the customer.

Once created, a customer can add items to their wish list, and drag them around in the UI. In the backend, moving an item can be done in four ways: Moving it to the start or end, or moving it before or after a specific item on the list.

With this interface, you can implement dragging in the front-end application. Once a user queries their wish list, it will return the items in the same order they arranged them.

To make this example more accessible, let’s continue by creating the tables in question.

A note on performance

Before we get into the depths of storing and retrieving relations, you should think about limits and complexity. The more relations are stored, the more expensive operations become that work on all rows. This means that choosing a strategy for moving relations that involves moving all items in the wish list of a user becomes increasingly slow if you store millions of items.

For the wish list, adding a cap of 100 or 1000 entries should be plenty and creates a predictable load on the database. As I’ve mentioned in the past, you should add pagination and limits or services quotas to all resources, otherwise, you cannot fulfill SLAs.

Whatever you do, never promise “infinite” anything.

When you’re done with this guide you may want to add sorting capabilities to all relation tables, just in case, but I think most relations don’t have to be user-sortable in the first place. Usually, most tables are sorted by a timestamp determined at insertion time, so you don’t have to add positions.

Creating a testbed

First, let’s define our tables. We’ll store our items and customer in regular tables and create a mapping table between the two, using foreign keys for each side. We’ll also add a position to the mapping table to define the order of items for each customer.

For some relations, it may make sense to add a position for the reverse side as well but for now, we’ll make the relation sortable only in one direction (a customer can sort their wish list).

create table "item" (
  "id" varchar(64) NOT NULL,
  "name" text NOT NULL,

  CONSTRAINT "item_pkey" PRIMARY KEY ("id")
);

create table "customer" (
  "id" varchar(64) NOT NULL,
  "name" text NOT NULL,

  CONSTRAINT "customer_pkey" PRIMARY KEY ("id")
);

create table "wish_list" (
  "item" varchar(64) NOT NULL,
  "customer" varchar(64) NOT NULL,
  "position" double precision NOT NULL,

  CONSTRAINT "wish_list_pkey" PRIMARY KEY ("item", "customer"),
  CONSTRAINT "wish_list_item_fkey" FOREIGN KEY ("item") REFERENCES "item" ("id") ON DELETE CASCADE,
  CONSTRAINT "wish_list_customer_fkey" FOREIGN KEY ("customer") REFERENCES "customer" ("id") ON DELETE CASCADE
);

You might wonder why we chose double precision for our position column. Similar to real, double precision represents inexact, variable-precision numeric values (floating point numbers). Inexact means that, in some cases, a value may be slightly different compared to the value that was entered when inserting the row. For us, this is acceptable, we’ll cover why that is later on.

Up next, we’ll add some data to our tables and connect a couple of items to our example customers.

insert into "item" ("id", "name")
values
  ('road-bike', 'Road-Bike'),
  ('candy-bar', 'Candy Bar'),
  ('gopher-plushie', 'Gopher Plushie (blue)'),
  ('keyboard', 'Mechanical Keyboard'),
  ('subwoofer', 'Smart Subwoofer')
;

insert into "customer" ("id", "name")
values
  ('bill', 'Bill G.'),
  ('steve', 'Steve J.')
;

insert into "wish_list" ("item", "customer", "position")
values
  ('subwoofer', 'steve', 0),
  ('road-bike', 'steve', 1),
  ('keyboard', 'steve', 2),

  ('candy-bar', 'bill', 0),
  ('gopher-plushie', 'bill', 1)
;

Inserting new data

When a user wants to add a new item to their wish list, we can default to adding it behind the last item to become the new end of the list. This approach is the easiest, but we will see more options like adding the new item before or after another item, or at the end of the list, later on when we think about moving items around.

insert into "wish_list" ("item", "customer", "position")
values
  (
    'candy-bar',
    'steve',
    (
      coalesce(
        (select max(position) from "wish_list" where "customer" = 'steve'),
        -1
      ) + 1
    )
  )
;

This statement inserts a new row into the wish_list table, adding item b to the end of our customer steve's list. To understand some of the functions we’ll be using throughout this guide, let’s break down the last expression.

The coalesce(args…) function receives multiple arguments and returns the first non-null argument value. In our case, we first evaluate the first argument

select max(position) from "wish_list" where "customer" = 'steve'

which will return the current-highest position on the wish list of customer steve. In case their list is still empty, one row with a null value will be returned. This is why we need to use coalesce, to return a fallback value of -1.

We then go ahead and add 1 to the return value of the coalesce function to increase the position of the item we want to insert, pushing it at the end of the list (or as the first item with position 0).

We can also rewrite the coalesce usage to

select
  coalesce(max(position), -1) + 1
from "wish_list" where "customer" = '1'

This places the coalesce in the same statement that selects the current max position. Both statements yield the same result, I prefer the first option as it easily shows the options we have (either an existing result or the first item in the list).

That’s all we need for inserting new items to the end of the list, easy as that.

Moving items

As we want our customers to move items around the list to prioritize the most important ones, we need to implement the moving part. So far, we have a table containing rows with increasing positions. Now, we want to support moving items in four ways: Adding them after or before other items, and adding them to the start or end of the list.

To move items, we can implement two basic strategies.

Always Rearrange

The most straightforward way to add a new row between two existing ones is to shift all previous or subsequent row positions by one, creating a “gap” into which our new row can be placed.

Imagine the following list

select "position", "item"
from "wish_list"
where "customer" = 'steve'
order by "position" asc;

1. subwoofer
2. road-bike
3. keyboard
4. candy-bar

If we want to add gopher-plushie before keyboard, the idea with rearranging is that we first find out the position of keyboard (3) and shift all positions ≥ 3 by one.

update "wish_list"
set "position" = "position" + 1
where "customer" = 'steve' and "position" >= 3;

1. subwoofer
2. road-bike
< empty >
4. keyboard
5. candy-bar

Now, we can simply insert our row with the previous position of keyboard.

insert into "wish_list" ("position", "name") values (3, 'gopher-plushie');

1. subwoofer
2. road-bike
3. gopher-plushie
4. keyboard
5. candy-bar

The same logic can be used to add an item after another one.

As you can see, this system is understandable, works in integer steps, and supports small data sets completely fine. For bigger tables, moving millions of rows for one operation may lead to issues, though. This is exactly why you should think about whether a relation should be (user-) sortable, and how many rows it will contain, restricted by service quotas.

Floating-Point Ordering

Another strategy we can use to limit updating rows is to use the floating point data type to average positions when we need to insert a value in between two rows. We can make this easier to grasp with a short example

select "position", "item"
from "wish_list"
where "customer" = 'steve'
order by "position" asc;

1. subwoofer
2. road-bike
3. keyboard
4. candy-bar

Once again, we want to insert gopher-plushie before keyboard, but instead of creating a gap as before, we average the existing items for a new position

insert into "wish_list" ("item", "customer", "position")
values
  (
    'gopher-plushie',
    'steve',
    (
      -- find position of item referenced in before argument (default to 0)
      with "before_position" as (
        select coalesce(
          (
            select "position" from "wish_list" where "customer" = 'steve' and "item" = 'keyboard'
          ),
          -- in case item was not found, use last item in list and add 1 to add item to end of list
          (
            select max("position") + 1 from "wish_list" where "customer" = 'steve'
          ),
          -- in case no item exists, use 0
          0
        ) as "position"
      ),
      -- find position of previous item relative to "before item"
      "before_prev_position" as (
        select coalesce(
          (
            select w."position"
            from "wish_list" w, "before_position" b
            where
              w."customer" = 'steve' and
              -- positions may not be integers, so we cannot simply deduct 1
              -- this is why we find the first item with a smaller position
              w."position" < b."position"
            order by "position" desc limit 1
          ),
          -- in case previous position does not exist (before item was first in list), simply deduct 1
          (select b.position - 1 from "before_position" b)
        ) as "position"
      )
      -- average both positions to fit new row into gap
      select (b.position + p.position) / 2
      from "before_position" b, "before_prev_position" p
    )
  )
;

There’s a lot to this statement, and I recommend you read through it carefully and run the select statements individually. Most of the query is finding out the positions of the item that we want to place gopher-plushie before and the item before that so that we can calculate the average for an artificial gap.

select "position", "item"
from "wish_list"
where "customer" = 'steve'
order by "position" asc;

1. subwoofer
2. road-bike
2.5. gopher-plushie
3. keyboard
4. candy-bar

With this strategy, we avoid moving multiple rows, as each move simply calculates the average between existing up to two existing rows. We gracefully handle the case that the referenced existing item does not exist or is the first (last) element of the list so that we don’t have undefined behavior.

An issue with this approach is that you can order items only a limited number of times until you lose precision and run into inconsistencies as the double precision data type is inexact by nature. This could be solved by using the exact numeric data type or by normalizing the order periodically, removing all gaps, and setting positions to their integer values in the process.

Querying rows

If you want to fetch all existing items for a given customer, you may want to include the position in your query output. While you could write a simple SELECT like the following

SELECT
  wish_list.item,
  wish_list.position
FROM wish_list WHERE "customer" = 'steve' ORDER BY "position" ASC;

subwoofer	0
road-bike	1
gopher-plushie	1.5
keyboard	2
candy-bar	3

this will return non-integer values once you’ve moved items around with the floating point strategy. Using window functions, however, you can calculate positions on the fly

SELECT
  wish_list.item,
  row_number() OVER (ORDER BY "position" ASC) as "position"
FROM wish_list WHERE "customer" = 'steve';

subwoofer	1
road-bike	2
gopher-plushie	3
keyboard	4
candy-bar	5

Wrapping up

By now, you’ve learned how to implement sortable relations in PostgreSQL, using a position column in the mapping table. You also know how to implement the insertion of new rows before or after existing items or at the start or end of the list. Depending on your use case, you can use regular positioning that moves existing rows around or go for a floating-point approach that doesn’t require updates to existing rows but has other downsides.

Sortable relations are exciting and powerful to expose to your users, but this comes with the responsibility of knowing when to use them, and how to create limits to make sure your system is predictable at scale.