Efficient Caching with Reduced Roundtrips

Imagine you're building a service that heavily depends on content served by an external provider. Sending a request each time a resource is accessed would quickly lead to rate-limiting, so to provide an improved experience, we want to introduce caching.

If a cached resource expired or was not cached before, we have to fetch external data and store it in a shared data source.

We want to share our final cache as our system runs on more than one instance, and will be accessed by multiple users. We also want to increase our cache hit ratio as much as possible, to decrease roundtrips and enjoy improved latency.

With this setup, you have to add protection against two types of concurrent access: When your client is running operations concurrently, you could run into inconsistencies and side effects (data races) in the same client. If you're modifying a remote resource that other parties could access concurrently, you either have to design your system so multiple users modifying it at the same time doesn't have negative side effects or to introduce locking so only one actor can perform operations on a resource at a given moment.

Managing concurrent access and mutation of shared resources can be one of the most difficult topics when engineering systems. You can choose from plenty of mechanisms to ensure exclusive access, but often focus on locking.

If we encounter a cache miss, and thus have to fetch the resource once so we can store it, we want to send a limited number of requests to the third-party provider, as we could otherwise face issues with rate-limiting.

🗺 Cache Flow

To achieve fast subsequent access to a cached resource, we'll set up a per-request resource cache, that will store all accessed resources for the duration of a request.

If the resource is not found in this cache, we access the shared cache and check out if it exists on that layer. If not, or if it is expired, we'll re-fetch the resource from the source.

As you can imagine, accessing the per-request cache will be a very fast operation, using the shared cache will be slightly slower, and the "worst" outcome would be to send a request to the external provider, as this incurs latency and increases the usage toward our rate-limit.

✂️ Reducing Roundtrips with Deduplication and Batching

When we think about our main objectives, reducing external requests and serving (cached) content fast, we'll try to stuff as much as possible into very few requests.

If a user requests three different resources, for example, we could just perform three independent cache lookups, and send three requests to fetch the external resource to store it in our shared cache.

This has a negative effect on the requests we can send, as we might send a lot of those, and even worse, if resources are requested more than once, we perform completely unnecessary operations.

This leads to the first improvement: Deduplication. When accessing multiple resources concurrently, we'll make sure each request to our cache is for a unique resource. If the requested entity is not cached, we'll perform as many unique requests as is necessary. While this is better than the first approach, we still send multiple requests to the external service provider. Sometimes, however, batch APIs can be used to retrieve multiple entities at once, in a single request.

If this possibility exists, we can add another layer of batching, which will allow us to collect all potential requests to the external service, and send them off in one request. We'll then cache the returned content, and return it to the user.

⛓ Serial Batches

We previously introduced batching into our caching lifecycle. To reduce unnecessary requests of resources that might already be cached, we need to make sure that each batch is run exclusively, not in parallel with other batches of the same resource. Imagine one batch for a resource with the following IDs

Batch 1 IDs: 1, 2, 3

and another batch with the following contents

Batch 2, IDs: 4, 5, 1

As we can see, we're requesting the resource with ID 1 not once, but twice. If we run batches in parallel, we'll perform an unnecessary lookup. If we schedule the batches to run serially, though, we can easily return the previously processed resource in the second batch.

🔒 Transactional Locks to Prevent Side Effects

One last aspect of our cache layer is concurrency. Since we're operating on a shared cache, two users could retrieve the same resource at the same time, and even a single user could retrieve the same resource concurrently.

To prevent this, we'll have to add safeguards against concurrent access, at least where it is needed to prevent unexpected side-effects like creating a resource twice, which might violate some constraints.

Concurrent access can come into two forms, as explained above

in the same transaction/process
in a different transaction

Usually, each access type has to be mitigated in its own way: Accessing the same resource in the same request or transaction should be limited by batching, deduplication, and locks, if needed. Requests to one specific resource should always occur one at a time. This reduces latency, increases your cache hit ratio, and prevents unnecessary work.

Then there's the second type: What if two users were to access the same resource? They might hit different deployments, and our batches are created on a per-request basis, so we're not able to leverage the same structure as before.

For these cases, you should enforce exclusive access where needed: If you already know that some parts have to run coupled together and would break when modified concurrently, you have two options:

Check if you can modify the transaction isolation level, which can prevent side-effects from appearing while you're working on a resource
Lock access to a specific resource until you're finished working on it.

We'll consider the second alternative here, introducing locks. With Postgres, you can introduce session- or transaction-based advisory locks. The latter will unlock automatically once the transaction is completed (the outcome does not matter, they will be unlocked on COMMIT as well as on ROLLBACK).

begin;
-- lock access to resource "1"
select pg_advisory_xact_lock(1);
-- do some work
commit;

Running the query above will make sure that whenever another request accessing the same resource starts a transaction after this one, it has to wait until the first transaction is completed. In this time you can perform all actions that need strict transactional guarantees, after which the next transaction in line will run and secure exclusive access for its lifetime.

You could even extend locking to make sure the fetch is also executed just once, although the more locks you add, the more artificial delay you might incur in some cases, so it's important to be careful. Too many locks may also start harming performance elsewhere.

While your use case might differ, some of the ideas presented above might still hold true. Most concepts are generally applicable and don't depend on a specific implementation, however, I found DataLoader extremely useful for batching and deduplication purposes. Combined with locking it allows serial batches to load data efficiently.

If you have any questions, suggestions, or other feedback, reach out on Twitter or send a mail!