May 02, 2021

Running a commit-based release infrastructure

We spend a huge time thinking about building our products, but less about the pipeline that delivers features from engineering to production. Ideally, we want to test the latest changes before they're released to all our users, even if we carefully hide them behind feature toggles.

We also want to make sure the exact same code runs across all environments, in the same configuration, so we don't miss problems due to differences in infrastructure.

And last but not least, if something slipped through to production and everything is on fire, we need to be able to roll back the deployment to a previous state, where everything worked. While there are different reasons a deployment might fail, we will focus on issues that can be rolled back by switching to a previous commit. To enable this workflow, we need to keep a linear history, isolating every change in one commit. This is usually done by squashing all commits into one when merging, leaving you with one commit per merged pull request and a clean history.

In today's world of infrastructure, some system components might be defined as code, while others are deployed using external service providers. The variable that ties it all together is the commit. The commit reference identifies your codebase at a specific point in time and includes all services in the repository. To make the most out of this flow, we'll use a monorepo for the full stack, from frontend to backend.

The Deployment Flow

Every change will be worked on inside a branch created from your main branch. Once the work is ready to be deployed to staging, you can open a pull request to your main branch.

In the pull request, you can add automation for running tests, verifying code quality is acceptable, and even creating a branch preview, spinning up an extra environment for seeing the changes in action. The latter might be important as all commits in staging should ideally work as expected so a release or rollback can be done at any point in time.

Once the pull request is reviewed and approved, you can squash merge it into main. This push event will trigger a CI job that will build all services as Docker images and push them to your registry of choice. The images are tagged with the current commit SHA, so you can use the same image for staging and production. If you're worried that building all services may be inefficient, make sure to check out layer caching to restore state from a previous run, which should speed builds up considerably!

Once built, the CI runner will continue to deploy the images to the staging environment and run any other deployment-related tasks (provisioning other parts of your infrastructure).

After all jobs have run to completion, you can preview the change in a dedicated staging environment, that should be an exact copy of your production environment, except domains, nothing should really vary, so you perform real-world QA.

If the team is happy with everything and you collected a couple of changes you want to release, you can open a pull request that updates your infrastructure-as-code to match the new commit. In this release PR, you might run CI tasks to preview any changes to your infrastructure.

Once this pull request is merged, instead of deploying staging again, which would not make a difference, we will invoke a different CI pipeline that will deploy the current state to production.

As this flow is built around reactive CI pipelines which might run arbitrarily at any point, you should make sure to design your deployment tasks in an idempotent way, meaning that multiple deployments should only strive to create the state desired based on the current commit and prevent running into conflicts.

Managing and running your deployments and releases in CI, close to your version control, removes load from subsequent parts of your system as the real infrastructure is only concerned with running your software, not preparing it to do so. You can also easily swap out certain parts (using another registry, build steps, or deployment targets) to match your use cases and scale over time. Built right, you could start off with a single machine that runs a docker-compose deployment and scale up to managed container services in your cloud of choice later on.

Rolling back

Now that your infrastructure is up and running, imagine you release a change that turns out to break your application for your customers. One way to get back to a running system could be rolling back to a previous version which worked fine. Of course, depending on the actual incident you might not be able to roll back as easily, so those cases should be minimized as much as possible.

If you decide you want to roll back, simply change the commit your services are running on. As you still have service images in the registry and the tooling that allows you to deploy any given commit, the rollback isn't any different from a regular release, it's just going a step back.

I hope you enjoyed this post, diving deep into how we can achieve delivery pipelines that are transparent and scale efficiently. If you have any questions, suggestions, or feedback in general, send a mail or contact me on Twitter.