Generating Dynamic GitHub Actions Workflows with the Job Matrix Strategy

GitHub Actions is becoming one of the major CI providers, benefitting hugely from the tight integration to GitHub's other features. In this post, I'll walk through a feature that is seemingly inconspicuous but can become quite powerful if used right: Job strategies, and more precisely, the matrix strategy.

Within GitHub Actions Workflows, everything you want to run needs to be declared as a job with steps. This is great until you have quite similar workflows with only a few variations, such as builds for different versions, or infrastructure-as-code deployments of different services and targets.

Using the matrix strategy allows to write a job once, but pass in several variants that the job will be run for. With this, you write a baseline set of steps and other job details and access individual details such as the current version via the matrix context.

A static matrix

Let's try to understand the matrix strategy with the simple example of running a build for multiple versions of Node.js.

jobs:
  build:
    strategy:
      matrix:
        node: [10, 12, 14]
      steps:
        # Configures the node version used on GitHub-hosted runners
        - uses: actions/setup-node@v2
          with:
            # The Node.js version to configure
            node-version: ${{ matrix.node }}

In this example, we provided a parameter called node to the matrix, with a list of major versions we want to target. For each of these versions, we will run the job once, setting the matrix context to the current version. We can then access the current node version with ${{ matrix.node }}.

If this sounds abstract, think of it as a for loop

const matrixNode = [10,12,14]
for (const node of matrixNode) {
  runJob(..., { matrix: { node } })
}

// -> runJob(..., { matrix: { node: 10 } })
// -> runJob(..., { matrix: { node: 12 } })
// -> runJob(..., { matrix: { node: 14 } })

You can also specify multiple matrix configurations for a job

matrix:
  os: [ubuntu-18.04, ubuntu-20.04]
  node: [10, 12, 14]
# The matrix above generates the following jobs:
# os: ubuntu-18.04 node: 10
# os: ubuntu-18.04 node: 12
# os: ubuntu-18.04 node: 14
# os: ubuntu-20.04 node: 10
# os: ubuntu-20.04 node: 12
# os: ubuntu-20.04 node: 14

With this, GitHub Action will determine all variations between the two operating systems and three versions, resulting in six total jobs.

Of course, we had to specify each version we want to run on manually, so this workflow fits best if you don't often change the matrix configuration, or if it's fine to do so manually.

Scopes available in the strategy

Previously, we declared our matrix configuration statically, so for any change, we would have to edit the workflow configuration file. If your matrix configuration is more dynamic, or if you want to use a single source of truth for which jobs to generate, let's check out if there are other ways to pass in our matrix configuration.

Fortunately, the Actions documentation includes a helpful page explaining contexts and their availability. If we search for the strategy scope, we can see that environment variables are unfortunately not available to use for the strategy, but the needs context is. This way, we can chain two jobs together, one for retrieving the matrix configuration, and a second one that declares and uses it to generate a dynamic number of jobs.

Using a previous job's outputs

Checking out the documentation, we found out that you can use a previous job's output as input for a job strategy, including the matrix configuration. We can use this fact to dynamically generate our matrix configuration.

name: build
on: push
jobs:
  job1:
    runs-on: ubuntu-latest
    outputs:
			# This needs to match your step's id and name parameters
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
			# Important: Do not forget the id!
      - id: set-matrix
        run: echo "::set-output name=matrix::{\"node\":[10, 12, 14]}"
  job2:
    needs: job1
    runs-on: ubuntu-latest
    strategy:
			# This needs to match the first job's name and output parameter
      matrix: ${{fromJSON(needs.job1.outputs.matrix)}}
    steps:
      - run: build

This example showcases how we can declare two jobs, a first one to output our matrix configuration by using the ::set-output workflow command to set an output parameter, and a second job that will only run once the first one completes and uses the output as its strategy.

We pass the matrix configuration as a JSON string, so in the second job, we parse it using the fromJSON function, as the strategy requires objects or arrays to work with.

Our example job pretty ends up with the same matrix configuration as the previous static example, but this time, we can use environment variables or any command to generate our workflow dynamically.

Example: From environment variables

With the separate preparation step, we can use environment variables to hold our matrix configuration.

name: build
on: push
env:
	MATRIX: "{\"node\":[10, 12, 14]}"
jobs:
  job1:
    runs-on: ubuntu-latest
    outputs:
			# This needs to match your step's id and name parameters
      matrix: ${{ steps.set-matrix.outputs.matrix }}
    steps:
			# Important: Do not forget the id!
      - id: set-matrix
        run: echo "::set-output name=matrix::$MATRIX"
  job2:
    needs: job1
    runs-on: ubuntu-latest
    strategy:
			# This needs to match the first job's name and output parameter
      matrix: ${{fromJSON(needs.job1.outputs.matrix)}}
    steps:
      - run: build

This way, we just need to update the environment variable at the top, instead of moving through the complete workflow and finding places to update.

Example: Run for all Pulumi stacks

As a final example, we can improve the experience for infrastructure-as-code tooling in CI by using said tools as the source of truth. In this case, we'll use the Pulumi stack files to list all stacks we should run through, and use that as the matrix configuration. And whenever we add a new stack, it'll automatically be included.

job1:
  runs-on: ubuntu-latest
  outputs:
    matrix: ${{ steps.set-matrix.outputs.matrix }}
  steps:
    - uses: actions/checkout@v2
    - id: set-matrix
      run: echo "::set-output name=matrix::$(ls Pulumi.*.yaml | sed s/Pulumi\.// | sed s/\.yaml// |  jq -Rsc '. / "\n" - [""]')"
job2:
  name: ${{ matrix.stack }}
  needs: job1
  runs-on: ubuntu-latest
  strategy:
    matrix:
      stack: ${{fromJSON(needs.job1.outputs.matrix)}}

The command can be a bit difficult to read but it loads all Pulumi stack files in the current directory, removes the prefix and suffix so only the stack name remains, and formats this as a JSON array in the form that we need in the second step.

This way, we run a job for each stack, which is reflected in the job name as well.

With the matrix strategy, you can make your GitHub Actions incredibly dynamic and versatile, using one source of truth such as another tool to generate as many jobs as you need.

One important limit you should take into account is that a job matrix can only generate up to 256 jobs per workflow run. If your use case would result in more than that, you might need to think about a different approach and investigate if GitHub Actions is the best fit for your case.