Nov 27, 2021

Scheduling Tasks with Topological Sorting

Imagine you want to bake a cake: There are multiple tasks you need to follow to end up with the final dish, from gathering and preparing ingredients over setting up the oven to finally baking it. Most, if not all, tasks depend on the completion of their predecessor.

To represent this in a computer science context, and go from a shuffled list of tasks to a sorted list of steps you need to take one after another to get to the result, you need a data structure that is made for this context. And as it so often is the case, we'll need a directed graph.

The problem of creating a valid execution sequence for steps with dependencies is widely used for actual dependency graphs, infrastructure-as-code runtimes, and other scheduling contexts.

Directed Acyclic Graphs

In our task graph, which I'll refer to as dependency graph in the following, our nodes represent individual tasks or steps to be taken, and the directed edges connect steps in the order they must be processed. Since we cannot allow circular dependencies, that is, no earlier tasks may depend on later tasks, we will work with directed acyclic graphs.

I've prepared the basic Graph data structure as a class using TypeScript below.

/**
 * A graph data structure
 */
export class Graph<T> {
  /**
   * Nodes in the graph
   */
  nodes: T[] = [];

  /**
   * Edges between nodes
   */
  edges: [T, T][] = [];

  constructor(nodes: T[] = [], edges: [T, T][] = []) {
    this.nodes = nodes;
    this.edges = edges;
  }

  /**
   * Add a node to the graph
   * @param t
   */
  addNode(t: T) {
    this.nodes.push(t);
  }

  /**
   * Add an edge between two nodes
   * @param from
   * @param to
   */
  addEdge(from: T, to: T) {
    this.edges.push([from, to]);
  }

  /**
   * Remove an edge from the graph
   * @param from
   * @param to
   */
  removeEdge(from: T, to: T) {
    this.edges = this.edges.filter(([f, t]) => !(f === from && t === to));
  }
}

As you can see, other than storing nodes and edges, we're not doing much here yet. Up next, let's prepare our example graph.

Setting up our example

Keeping our cake example, we'll create a node for each step and connect steps with edges to create our order. For readability, we add nodes and edges in the right order, but try to shuffle it around to see that our ordering really works as intended and doesn't just return items in the order we added them.

const g = new Graph<string>();

g.addNode('Gather your ingredients');

g.addNode('Preheat the oven');
g.addEdge('Gather your ingredients', 'Preheat the oven');

g.addNode('Cream the butter and sugar');
g.addEdge('Preheat the oven', 'Cream the butter and sugar');

g.addNode('Add the eggs and vanilla');
g.addEdge('Cream the butter and sugar', 'Add the eggs and vanilla');

g.addNode('Stir in the cake flour');
g.addEdge('Add the eggs and vanilla', 'Stir in the cake flour');

g.addNode('Pour the batter into the pan');
g.addEdge('Stir in the cake flour', 'Pour the batter into the pan');

g.addNode('Bake the cake for 1 hour 15 minutes');
g.addEdge(
  'Pour the batter into the pan',
  'Bake the cake for 1 hour 15 minutes'
);

console.log(g.nodes);
console.log(g.edges);

Ordering nodes with Topological Sorting

As our edges represent scheduling constraints, making sure our task nodes are performed only when all their predecessors are completed, we can make use of topological sorting (or topological ordering) to obtain a valid sequence for our tasks that respects all dependencies.

While there are many possible ways to implement topological sorting, we'll use Kahn's algorithm which runs in linear time (number of nodes + number of edges). The result of running Kahn's algorithm is either a list of sorted nodes (in order of the valid sequence) or an error in case the graph contains a cycle and thus is not a directed acyclic graph.

In our implementation, we clone the graph since it is modified during the sorting operation.

/**
 * Perform topological sort on the graph (Kahn's algorithm)
 *
 * Either returns the sorted list of nodes or throws an error if the graph is not acyclic
 */
export function kahns<T>(input: Graph<T>) {
  const clonedGraph = new Graph<T>([...input.nodes], [...input.edges]);

  const L: T[] = [];
  const S: T[] = clonedGraph.nodes.filter(
    n => clonedGraph.edges.filter(([_, to]) => to === n).length === 0
  );

  // while S is not empty
  while (S.length > 0) {
    // remove a node n from S
    const n = S.shift();
    if (!n) {
      break;
    }
    // add n to L
    L.push(n);

    // for each node m with an edge e from n to m do
    for (const m of clonedGraph.edges
      .filter(([from, _]) => from === n)
      .map(([_, to]) => to)) {
      // remove edge e from the graph
      clonedGraph.removeEdge(n, m);
      // if m has no other incoming edges then
      if (clonedGraph.edges.filter(([_, to]) => to === m).length === 0) {
        // insert m into S
        S.push(m);
      }
    }
  }

  // if graph has edges then
  if (clonedGraph.edges.length > 0) {
    // return error (graph has at least one cycle)
    throw new Error('Graph has cycles');
  }

  return L;
}

Running the new function on our example graph now yields the sorted sequence of tasks to bake our cake. Shuffling the order of insertion of our nodes/edges will not modify the result. It's still important to acknowledge that the order of returned nodes depends on the order that nodes n are removed from the set S.

console.log(kahns(g));
[
  "Gather your ingredients",
  "Preheat the oven",
  "Cream the butter and sugar",
  "Add the eggs and vanilla",
  "Stir in the cake flour",
  "Pour the batter into the pan",
  "Bake the cake for 1 hour 15 minutes"
]

This implementation is a very simple introduction to the topic of directed acyclic graphs and topological ordering, and there are many optimizations such as adding a tiebreaker to the sorting procedure to get a consistent result or using parallel sorting when dealing with much larger graphs.