Jan 17, 2021

Centralized Validation with GraphQL Scalar Types

When building user-facing API products and services, validating inputs and enforcing limits is hugely important. Often, this involves repeating similar steps to check given parameters and arguments. Since you can't really trust user-supplied data until it is validated, you're usually required to validate arguments and then perform your business logic. Repeating this in multiple places is cumbersome and opens up possibilities for code duplication, missing validation steps, or other unintended side effects.

If you are serving a GraphQL API, these problems may still occur. However, you now have the opportunity to refactor your validation to one central point, using a schema-driven approach: custom scalar types.

🍃 Scalar Types

Scalar types are the atomic unit of your schema. Once hit, the execution layer will not walk down the AST further, as any field using a scalar type is a leaf node. There are built-in scalar types out of the box, such as IDs, Strings, Floats, Ints, and Booleans.

You can also add your own, custom scalar types to the schema, for example types for Email, JSON, phone numbers, or any other data type you use.

This is already incredibly useful to tailor your schema and data graph to your specific context, e.g. using scalar types that describe data specific to your operations, but you can do even more.

When the execution process walks over a scalar field, it will try to process it in two ways:

  • If it is an input field, it will either parse the inline value if you supplied a value in the query document itself or use the value of a variable instead. It will then pass the resulting value to further resolution, for example, your query or mutation resolvers.
  • If it is a regular field that is returned as part of the response, you have the opportunity to serialize the value your resolvers returned, for example stringifying or wrapping it, to meet the expectations of your clients.

💂‍♀️ Validation with Scalar Types

Now that we know that scalar types can be used to make your schema more descriptive, we can check out an additional use case: validation.

In the same step we parse the supplied value and return it for subsequent resolvers, we can perform any validation logic on the static value. Let's think of a simple example, how about we want to support a Json scalar type that takes in a stringified JSON value and returns the parsed data to the server. In the API response, it might just be the actual data again. In our schema, we would declare our new scalar type, and use it for a mutation.

scalar Json

type Mutation {
  doSomething(data: Json!): Json
}

While we can now start up our GraphQL API, the execution layer does not really know how to parse or return Json values. That's the next part

import { GraphQLScalarType, Kind } from 'graphql';

const parseAndValidateJson = (value: string): unknown => {
  try {
    const parsed = JSON.parse(value);
    return parsed;
  } catch (err) {
    throw new Error('Invalid JSON, please supply a valid value');
  }
};

export const JsonScalar = new GraphQLScalarType({
  name: 'Json',
  description: 'JSON custom scalar type',

  // Validate client value
  parseValue(value: unknown) {
    if (typeof value !== 'string') {
      throw new Error('Expected string value');
    }
    return parseAndValidateJson(value);
  },

  // Validate client literal value
  parseLiteral(ast) {
    if (ast.kind === Kind.NULL) {
      return null;
    }

    if (ast.kind !== Kind.STRING) {
      throw new Error('Expected literal kind to be String for JSON values');
    }

    return parseAndValidateJson(ast.value);
  },

  // Simply return server-side value to client
  serialize(value: unknown) {
    return value;
  }
});

In this snippet, we declare a new GraphQL scalar type. This uses the graphql-js reference implementation for JavaScript, and defines a new type with a name, description, and a couple of important functions:

  • parseValue receives an externally-supplied value used for inputs (e.g. variable values) and returns the value used internally from that point on.

  • parseLiteral receives values that are hard-coded in the query document, e.g.

    query sample {
      doSomething(data: "{\"hardCoded\": true}")
    }
    
  • serialize converts the values returned in resolving to a serialized representation (e.g. when returning Date objects, you might want to return a date string to the API consumer)

In our case, in addition to parsing the JSON value, we also handled potential errors, which usually occur when the value is invalid and returned a fitting error message for those cases.

Once you defined the scalar type, you can add it to your schema resolvers. After that is done, you can enjoy the benefits of trusting the data your resolvers receive in the sense that they have been validated before. You don't have to write validation logic for every resolver anymore, everything is handled and testable already.

Now you just have to annotate every field that receives or returns JSON values with the respective scalar type, and you're set!

As a side note: You do not have to create a custom scalar type to implement validation or different parsing than the built-in types, for those you can just create the same scalar type object as above and add it to your resolvers. This could be used to validate ID values (for example when strictly using UUIDs), without re-declaring the ID type. For built-in scalars, you do not have to add a scalar type definition in your schema, though.

🔒 Increasing type-safety with code generation

There's one more benefit you can enjoy when using custom scalar types with TypeScript: Code generation. With tools like graphql-code-generator, you can customize the generated type definitions so that scalar types match the output you generated in your custom scalars. An example configuration for the aforementioned library could be

overwrite: true
schema: 'schema.graphql'
generates:
  src/generated/graphql.ts:
    plugins:
      - 'typescript':
          # Should be in sync with scalar parsing behaviour
          scalars:
            Email: string
            DateTime: Date
            Json: unknown

This configures the TypeScript plugin to generate all email values as strings, DateTime values as JavaScript Dates (which we parsed and validated in the custom scalar code), and Json values as unknown, as they may still contain user-supplied structures.

Given this setup, you can configure validation once, annotate all occurrences in your schema, and enjoy a type-safe resolving experience.

📚 Resources on (custom) scalar types