Running Untrusted Workloads with Firecracker and containerd

Firecracker was designed by AWS to run untrusted workloads in a secure environment for their Lambda and Fargate services while retaining the fast startup times and low resource overhead you expect from serverless functions and containers. Represented as microVMs, workloads deployed with Firecracker are extremely lightweight, so that a given host system can run thousands of multi-tenant VMs at a time.

On its own, Firecracker is a bit tedious to use in case you want features like lifecycle management where VMs are restarted when they exit or deployments across multiple machines. Luckily, there are several ways you can make Firecracker the runtime for containerd, the interface that manages the complete container lifecycle for Docker, Kubernetes, and other container-based systems.

In this guide, we'll cover all steps needed to get you deploying isolated workloads on Firecracker through Kata Containers and Nomad. This setup can be interchanged on various levels, you could use Kubernetes for orchestration or firecracker-containerd instead of Kata Containers.

Firecracker as runtime for containerd

I mentioned that there are multiple ways to integrate Firecracker with containerd. A straightforward way is to use Kata Containers, a secure container runtime that offers Firecracker support out of the box.

For the rest of this guide, I'll assume you're working with a machine running Ubuntu 20.04, which supports nested virtualization and KVM. You can check this by running kvm-ok (you might have to install a package for this, but your system will prompt you to do so).

Set up Docker

If you haven't done so before, we need to install Docker. You can head over to the official documentation and follow the instructions.

Download Kata Containers

Up next, we'll download and link the Kata containers runtime and containerd shim.

wget https://github.com/kata-containers/kata-containers/releases/download/2.1.1/kata-static-2.1.1-x86_64.tar.xz
xzcat kata-static-2.1.1-x86_64.tar.xz | sudo tar -xvf - -C /
sudo ln -s /opt/kata/bin/kata-runtime /usr/local/bin
sudo ln -s /opt/kata/bin/containerd-shim-kata-v2 /usr/local/bin

Registering Kata Containers as default runtime

With this in place, we can register Kata containers as runtime for containerd by editing the configuration file at /etc/containerd/config.toml

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
          runtime_type = "io.containerd.kata.v2"

After restarting containerd, we can start a container running on Kata Containers.

sudo systemctl restart containerd

To verify that we can run a container with the Kata Containers runtime, run

sudo ctr image pull docker.io/library/ubuntu:latest
sudo ctr run --runtime io.containerd.run.kata.v2 -t --rm docker.io/library/ubuntu:latest ubuntu-kata-test /bin/bash

Setting up devmapper

Up next, we need to set up the devmapper snapshotter plugin for containerd, which stores snapshots in a devicemapper thin pool. This is necessary to have resizable container storage. We'll use a devmapper configuration that uses loopback devices, which are quite slow, so you should not use this setup anywhere near production environments!

You need to have make installed for the next operation.

sudo apt install make

To configure our devmapper thin pool, we'll create a new bash script

#!/bin/bash
set -ex

DATA_DIR=/var/lib/containerd/devmapper
POOL_NAME=devpool

mkdir -p ${DATA_DIR}

# Create data file
sudo touch "${DATA_DIR}/data"
sudo truncate -s 100G "${DATA_DIR}/data"

# Create metadata file
sudo touch "${DATA_DIR}/meta"
sudo truncate -s 10G "${DATA_DIR}/meta"

# Allocate loop devices
DATA_DEV=$(sudo losetup --find --show "${DATA_DIR}/data")
META_DEV=$(sudo losetup --find --show "${DATA_DIR}/meta")

# Define thin-pool parameters.
# See https://www.kernel.org/doc/Documentation/device-mapper/thin-provisioning.txt for details.
SECTOR_SIZE=512
DATA_SIZE="$(sudo blockdev --getsize64 -q ${DATA_DEV})"
LENGTH_IN_SECTORS=$(bc <<< "${DATA_SIZE}/${SECTOR_SIZE}")
DATA_BLOCK_SIZE=128
LOW_WATER_MARK=32768

# Create a thin-pool device
sudo dmsetup create "${POOL_NAME}" \
    --table "0 ${LENGTH_IN_SECTORS} thin-pool ${META_DEV} ${DATA_DEV} ${DATA_BLOCK_SIZE} ${LOW_WATER_MARK}"

cat << EOF
#
# Add this to your config.toml configuration file and restart containerd daemon
#
[plugins]
  [plugins.devmapper]
    pool_name = "${POOL_NAME}"
    root_path = "${DATA_DIR}"
    base_image_size = "10GB"
    discard_blocks = true
EOF

After creating this, make it executable and run it once. At the end of the output, you should see a new section that must be added to the configuration file. Since we're dealing with TOML, remove the duplicate [plugins] definition.

Also add devmapper as snapshotter to use:

[plugins.cri.containerd]
  snapshotter = "devmapper"

Once again, restart containerd for the changes to take effect.

Setting up Firecracker runtime

With the devmapper in place, we can finally add our Firecracker runtime. Instead of overwriting the Kata Containers default runtime, we'll add this as an additional runtime by creating a new script at /usr/local/bin/containerd-shim-kata-fc-v2:

#!/bin/bash
KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-fc.toml /opt/kata/bin/containerd-shim-kata-v2 $@

Once again, make this file executable and open up the containerd config again to add the Firecracker runtime below the Kata runtime.

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-fc]
  runtime_type = "io.containerd.kata-fc.v2"

And don't forget to restart containerd again to apply all configuration changes!

Running containers manually

We've got everything in place to run isolated workloads with Firecracker now. Since we haven't configured any orchestration system, we'll just start a container manually, using the containerd CLI

sudo ctr images pull --snapshotter devmapper docker.io/library/ubuntu:latest
sudo ctr run --snapshotter devmapper --runtime io.containerd.run.kata-fc.v2 -t --rm docker.io/library/ubuntu:latest ubuntu-kata-fc-test uname -a

If the container prints the output successfully, all steps worked and you're running a completely isolated VM right now, which in turn is running your specified container image which was slightly reconfigured to make it executable in this new context.

Notice that you must pass the devmapper snapshotter and kata-fc runtime for your containers to truly use the right configuration. When using tools that invoke containerd, we can make use of labels to set the defaults for these options, so if the tools do not overwrite them, we don't need further changes.

sudo ctr namespaces create fc
sudo ctr namespaces label fc \
  containerd.io/defaults/runtime=io.containerd.run.kata-fc.v2 \
  containerd.io/defaults/snapshotter=devmapper

With this set, you can now start up a container without the snapshotter or runtime arguments!

sudo ctr -n fc images pull docker.io/library/ubuntu:latest
sudo ctr -n fc run -t --rm docker.io/library/ubuntu:latest ubuntu-kata-fc-test uname -a

The primary cause of failing containers I experienced was due to the system invoking containerd passing different options or somehow getting around the runtime and snapshotter defaults, so I'll highlight once more that you should really make sure your container is running with the Firecracker runtime, otherwise you're not getting the isolation you want!

Automating deployments with Nomad

One possible way to orchestrate Firecracker VMs is to use Nomad, a more self-contained alternative to Kubernetes.

First, adjust your Nomad config in /etc/nomad.d/nomad.hcl and add the plugin stanza at the end of the file.

plugin "containerd-driver" {
    config {
        enabled = true
				# Run Firecracker containers!
        containerd_runtime = "io.containerd.run.kata-fc.v2"
        stats_interval = "5s"
    }
}

Once this is set, we can create label the nomad containerd namespace so all containers use the Firecracker runtime and devmapper snapshotter by default. Doing this means we can avoid forking the containerd driver for Nomad.

sudo ctr namespaces create nomad
sudo ctr namespaces label nomad \
  containerd.io/defaults/runtime=io.containerd.run.kata-fc.v2 \
  containerd.io/defaults/snapshotter=devmapper

Up next, we need to download the containerd driver for Nomad, which Roblox is maintaining.

mkdir -p /opt/nomad/data/plugins && cd /opt/nomad/data/plugins
curl https://github.com/Roblox/nomad-driver-containerd/releases/download/v0.9.2/containerd-driver

For exposing our service, we need to download the Container Networking Interface (CNI) plugins for.

mkdir -p /opt/cni/bin
wget https://github.com/containernetworking/plugins/releases/download/v1.0.1/cni-plugins-linux-amd64-v1.0.1.tgz
tar -xvzf cni-plugins-linux-amd64-v1.0.1.tgz -C /opt/cni/bin
rm -rf cni-plugins-linux-amd64-v1.0.1.tgz

And one last time, restart our Nomad deployment.

systemctl restart nomad

With Nomad deployments, you're usually dealing with multiple machines, so you need to apply those changes on each node.

An example task

To test if everything worked, try to create the following job, which is configured to use the containerd driver which in turn uses our labels to determine the default runtime which instructs containerd to use the Kata Containers Firecracker runtime which internally sets up a Firecracker VM which runs our image. That's a lot of layers of abstraction, but it does a great job!

job "nginx" {
    datacenters = ["dc1"]
    group "nginx-group" {
        reschedule {
            attempts = 1
            interval       = "1h"
            delay = "10s"
            unlimited      = false
        }

        task "nginx-task" {
            driver = "containerd-driver"
            config {
                image = "docker.io/library/nginx:alpine"
            }
            resources {
                cpu = 500
                memory = 256
            }
        }

        network {
            mode = "bridge"
            port "http" {
                to = "80"
            }
            port "https" {
                to = "443"
            }
        }
    }
}

And that's it. Once the container starts up, you have a completely isolated instance of Nginx available at the random port Nomad allocated for you.

Even though we didn't have to go into any deep code changes, we went through quite a lot of steps here, so if you just want to deploy your own images, this isn't for you. But if you're interested in running untrusted workloads on bare-metal infrastructure, that's a setup you can start your journey with!