This post goes over some of our architecture work for the first iteration of Anzu. We’re currently hard at work building the next big step to get closer to our vision of helping teams ship faster, so make sure to check this blog out for updates regularly. If you haven’t read the previous posts in this series, give them a shot.
Traveling back a year in time, we were building the foundation of a system for managing your infrastructure in one place. Similar to existing infrastructure-as-code solutions, connectors to cloud service providers (referred to as providers in this post) written in multiple languages would be given instructions to create, read, update, or delete cloud resources. Our system would generate a plan to converge the actual state with the desired state.
The central primitive of Anzu back then was the provider resource. This could have been an S3 bucket, RDS instance, or anything else that could be deployed and configured, really. Each resource would receive input values and return output values, which could be used as input values for other resources.
Through this system of reusing outputs with inputs, our system would generate a dependency graph. This way, we’d know the order in which resources had to be modified. We’ll go into the details of generating this graph for different operations in a future post, but for now, we know everything we need.
For these input and output values, we had to find a suitable serialization format to cross service borders, allowing users to enter values in the frontend, storing them in the backend, and send them to the deployment job, where they could be converted into a native data structure to be used by the provider code.
In the following section, we’ll go through the initial requirements step by step, followed by the solutions we evaluated and eventually adopted.
Initial requirements
A value system that could evolve over time
From the beginning, we agreed on one fact: That we wouldn’t know how the system evolved over time, and that whatever solution we came up with, had to be flexible enough to accommodate changing use cases.
Native data structures for our provider code
Developers building providers for Anzu should have to endure as little boilerplate as possible, especially around handling values. In the best case, users would receive inputs as built-in data types and could return outputs as such.
Basic schema validation
While it would not be possible to validate values semantically (this would cause code duplication and possible drift on the provider side), we should differentiate between basic data types like strings, numbers, lists, and nested values.
Visual editing
Back then, we had a strong conviction that a rich, visual editing experience could make the tedious work of configuring infrastructure more accessible to newly-hired developers in our users’ teams. Since we were building for the web platform, we envisioned rich documentation for resource inputs and outputs, including multimedia elements like videos, and guides.
Evaluating solutions
JSON Schema
Initially, I checked out JSON schema as a way to streamline our schema validation. While JSON schema offers a good experience for validating and serializing values, our requirements for visual editing weren’t met. This wasn’t exclusive to JSON schema, most popular systems had a focus on validation but less on descriptiveness and multi-purpose use. We also wouldn’t have made use of most of the possible validation rules for values, as we only cared about structural validation.
A unified experience using ASTs
From my time working a lot on low-level GraphQL schema building and query translation, I had very positive feelings about dealing with ASTs, and the way AST values could be used to build up bigger expressions and statements.
Usually, languages like GraphQL are written in their respective syntax and tokenized (turned into ASTs as an intermediate format) when work needs to be performed. The resulting AST is decoupled from indentation and other irrelevant factors when analyzing and compiling code. We skipped the language step and directly used an AST-like structure.
We introduced two parts that made up our value system, both written in a tree-like JSON structure: definitions and values.
In the next part, we’ll go over the fundamentals of Anzu’s value system to gain a deeper understanding of the possibilities we unlocked by rolling our own data format.
The Value AST
Basics
We’ll go over the fundamentals of the value system by taking a look at our TypeScript library used across our internal services and for building providers in TypeScript, as well as the code generation system to turn values into native types.
export enum ValueKind {
Any = 'any',
Scalar = 'scalar',
List = 'list',
Object = 'object',
Map = 'map',
Output = 'output',
Configuration = 'configuration',
Field = 'field',
Function = 'function',
FunctionArgument = 'functionArgument'
}
This enum outlines the possible value kinds our system ultimately handled. As you can see, this ranges from regular data types (scalars, lists, objects, maps) to more specialized use cases for our system (output, configuration, function).
Some of the value kinds are only intended for input types, such as output and configuration values, and functions with their arguments. These values are internally resolved to output types that need to match, we’ll cover an example later on.
Definitions
export interface ValueDefinitionBase {
kind: string;
isRequired?: boolean;
isSecret?: boolean;
rendering?: ValueRenderingOptions;
resource?: ResourceOptions;
}
export interface ScalarDefinition extends ValueDefinitionBase {
kind: ValueKind.Scalar;
expectedUnderlyingType: ScalarUnderlyingType;
isMultiLine?: boolean;
}
Let’s say you’d like to build a provider for AWS so users can manage their S3 buckets within Anzu. In the provider configuration file, you would configure a resource that receives a list of inputs. To define these, you would provide value definitions according to the schema above.
You might want to define an ACL field, where users can add the canned ACL to grant bucket access. As you’d need a string, you would use a scalar field with an expected underlying type of String. For a scalar field, users configuring the resource in the front-end should see a text input.
Values
export interface ValueBase {
isSecret?: boolean;
}
export interface ScalarValue extends ValueBase {
kind: ValueKind.Scalar;
serializedValue: string;
underlyingType: ScalarUnderlyingType;
}
Values fit their definitions and can take the usual types found in the definition, as well as some platform-specific extras like outputs, configuration, and functions.
Schema Validation
To ensure a value is valid, we can walk the definition and compare the value side by side. If we step into a nested layer in the definition (say we validate an object which has a list of fields which in turn have a name and a value), we also step into the nested layer in the value itself. This way, we can recursively validate that the value fits the definition.
For special values like outputs, configuration, or function values, we need to retrieve its definition to figure out if it fits. This is the case if the expected value definition matches the output/configuration/return value definition (so when a string is expected, your output should be a string, too).
Signatures
When we needed to list all values that would fit an input, we had to think about how we stored values and their definitions in the database. If we just stored the definition as JSON, we would have to walk every row and compare it to the current input definition at hand, to see if the values fit. Imagine you had to find out if a given object x can fit the input that accepts an object y.
For this case, we introduced signatures (fingerprints of definitions) that were stored next to the definition at hand. This denormalization step made it easy to query suggestions for any input, as we just had to generate (or retrieve) the signature of the input (e.g. string! for a required string) and find all definitions of outputs or configuration values that had a matching signature. In case the field was optional, required definitions were, of course, also accepted.
Dependency resolution
With an easily-walkable tree structure, we could also retrieve all resource dependencies for a given value. We would simply walk each value and collect all the output values we found, and from there find the resources they belonged to. This way, our value system assisted the plan generation step by providing all the links between resources.
Code generation
Lastly, we could walk the value definitions to produce types and (de-)serialization methods for our input and output values.
type S3BucketInput []value.Input
type TypedS3BucketInput struct {
ACL string
}
func (i *S3BucketInput) parse() *TypedS3BucketInput { ... }
func (b *S3Bucket) Create(inputs S3BucketInput) {
parsed := inputs.parse()
...
return &TypedS3BucketOutput{...}
}
Internally, our job handler would retrieve all input and output values, and provide them to the lifecycle hook method (create, read, update, delete), where the developer could invoke the generated (de-)serialization methods to use the type-safe representation of the values.
Other use cases
While we envisioned the value system to be exclusively used for configuring resources, we ended up using it for our service feature where environment variables could be provided in a type-safe way, and for service connections, where you could have a provider generate code for you (to connect to a database managed in Anzu, in a completely type-safe way). This meant that we could bridge the gap between configuring and running services and cloud infrastructure, in a type-safe fashion.
Wrapping up
While the problem Anzu tackled didn’t fit the current environment, the value system was a big success and solved all the requirements we set for it. The idea of defining a schema and add code generation on top wasn’t the first of its kind, sure, but adding our unique constraints to the foundation of the platform helped us build a reliable system to manage infrastructure much faster.