Jul 25, 2021

Improving DataLoader Stack Traces

DataLoader is an elegant solution for batching access to your application data layer, and optionally caching it for subsequent access. With the rise of GraphQL, DataLoader has grown in popularity too, as it can be used for accessing data in multiple layers, caching it the first time, and returning the cached resource immediately on lower layers.

Internally, DataLoader keeps a batch until the condition to dispatch it is met, usually when the frame of execution ended (i.e. on the next tick). This helps to collect multiple attempts of loading resources and combining them in one round-trip to the database compared to individual queries for each item.

While the batching is incredibly valuable for your application's performance, it moves the process of loading into another context, by nature of the scheduling. This means that if you encounter an error while loading data and throw it, it will be returned to every point that invoked DataLoader but will only contain a stack trace starting from the DataLoader internals when the batch was dispatched.

To make this easier to grasp, imagine the following example code

const DataLoader = require('dataloader');

const dl = new DataLoader(async keys =>
  keys.map(k => new Error(`Could not load user ${k}`))
);

async function getUser() {
  try {
    await dl.load('1');
  } catch (err) {
    console.log(err);
  }
}

async function someResolver() {
  await getUser();
}

async function main() {
  await someResolver();
}

main();

This code attempts to load a single item from the DataLoader, which will always return a rejected promise. The error we log looks like this

Error: Could not load 1
    at .../example.js:5:17
    at Array.map (<anonymous>)
    at DataLoader._batchLoadFn .../example.js:5:8)
    at dispatchBatch (.../node_modules/dataloader/index.js:275:29)
    at .../node_modules/dataloader/index.js:258:12
    at processTicksAndRejections (node:internal/process/task_queues:76:11)

As you can see, while it contains the location where the error was created and even the map statement enclosing it, our stack trace never shows where exactly we invoked the DataLoader. This happens because invoking the batch load function (i.e. the function receiving keys and returning values) happens inside of DataLoader, disconnected from us calling .load.

Fortunately, though, this can be solved. In Node.js, we have a utility function Error.captureStackTrace which allows us to retrieve the current stack. In case our attempt to invoke the data loader fails, we can attach the current stack to the error DataLoader returned.

async function wrapLoader<K, V>(loader: DataLoader<K, V>, key: K) {
  try {
    const res = await loader.load(key);
    return res;
  } catch (err) {
    if (err instanceof Error && err.stack) {
      const currentStack = { stack: '' };
      Error.captureStackTrace(currentStack, wrapLoader);

      err.stack += currentStack.stack.replace('Error', '\nBefore DataLoader');
    }

    throw err;
  }
}

The new wrapLoader function receives your DataLoader and the key you're trying to load and will make sure to attach the external stack to the DataLoader stack, if present. A similar approach could be added for loading multiple keys, and you might even use inheritance to extend the original DataLoader class with improved error handling.

Changing our example from earlier to

async function getUser() {
  try {
    await wrapLoader(dl, '1');
  } catch (err) {
    console.log(err);
  }
}

now yields the following stack trace

Error: Could not load 1
    at .../example.js:18:17
    at Array.map (<anonymous>)
    at DataLoader._batchLoadFn (.../example.js:18:8)
    at dispatchBatch (.../node_modules/dataloader/index.js:275:29)
    at .../node_modules/dataloader/index.js:258:12
    at processTicksAndRejections (node:internal/process/task_queues:76:11)
Before DataLoader
    at processTicksAndRejections (node:internal/process/task_queues:94:5)
    at async getUser (.../example.js:23:5)
    at async someResolver (.../example.js:30:3)
    at async main (.../example.js:34:3)

With this simple addition, we combined both the stack trace returned from the DataLoader internals as well as the current stack for each location that called our DataLoader.

We found and solved this issue while building GraphQL services at Hygraph and credits to the solution fully go to my co-worker, Jonas. We initially captured the stack trace before running the DataLoader step, but the differences when running it only in the error case were negligible on the stack trace and probably lighter on the performance of your service.