Hey there 👋 I would like to quickly plug a publication I am working on to help teams build better AI-enabled products. If you are building a software product and want to integrate LLMs or make sure you're ready for going to production, make sure to check it out!
In the era of Software-as-a-Service products, we have seen a transition from interacting through user interfaces to interacting through APIs. Some products are available mainly through such a communication layer, enabling automation, and seamless integration between services.
Whether you use Stripe for billing, Shopify for commerce, AWS for a variety of services, or any other provider, you most likely used an internet-facing API, even if it was through a client library or another developer tool.
With developers expecting straightforward ways to integrate your service into their product, APIs have become more than a part of a service, they are the service.
So if you're building a service that exposes an API you expect your customers to use in the long term, make sure to think in the long run as well: Most companies in the API space, some of which I mentioned above, follow similar patterns in their designs, some of which I'll outline in this post.
In the following, you might notice that I wrote about points such as versioning and pagination in the past, they are as relevant as ever for building predictable services, so I'll include them here as well.
As the product keeps on evolving, features change over time as well. While most changes should be implemented in a non-breaking fashion, sometimes you have to deprecate and remove an old part and introduce a new workflow. This might happen when you change how a certain concept is handled internally, or you perform significant architectural changes.
When you run into this case of a breaking change, where existing customers may have to adjust their integrations, you have to communicate clearly what caused this decision, and either support the old version for a limited time in the future, a period in which your customers need to update their integrations, or indefinitely.
Often it makes sense to deprecate old features in advance, introduce new ones, and remove the unsupported ones after the communicated period has passed, but if you have many changing features it becomes hard to keep track, which is why we require a more structured approach: Versioning.
Versioning simply means that you annotate features with a "version" tag that identifies the implementation at a certain point in time. When you perform a change, you may introduce a new version. While concepts such as semantic versioning (1.0.0 i.e. major minor patch) are great for open-source products, you could also adopt date-based versions which match the time of release, for example, 2021.6 for the June release.
With this, you can collect all breaking changes in one place and tag them at the time of release. In addition to this, you can then deprecate older versions, so that you reduce the burden of maintaining multiple versions at each point in time.
Stripe, for example, uses an API version set per account throughout the whole product, not only for requests to the API but also for webhook deliveries. When they add breaking changes, they create a new version and allow users to upgrade to it, with a period in which they can revert the version and redeliver webhooks with the old API version. Stripe did not end support for any old API version to date.
Shopify releases new stable versions of their API at the beginning of each quarter and supports stable versions for 12 months after which they are removed in favor of newer releases. This allows customers to upgrade their applications and integrations in the 9 months from introduction to deprecation of an older version. When customers do not upgrade, Shopify will process requests with the oldest available stable API version. Usually, customers are expected to upgrade to the newest version every quarter.
How you implement versioning for your product may vary depending on your product development processes, when changes are released, and if you anticipate breaking changes at all. If you do not immediately discard old features, supporting old versions must be balanced against the speed of putting out new features. You don't want to get bogged down by legacy code, but customers expect a reasonable level of stability.
For these reasons, I think adding versioning centrally as a first-class concept instead of per-feature, creating new versions for breaking changes and deprecating old versions at the same time, then removing the latter after a clearly communicated period not too far in the future, should work out for most teams.
Who can read specific resources or perform actions shouldn't be binary and most customers will demand flexible permissions for different users and API consumers. To simplify the complexity, a permission system should apply to all actors in a system, both real users, access tokens, and other entities that send API requests, although this doesn't necessarily mean you expose it the same way.
Users might be assigned to one or more roles with a set of permissions, while tokens may be assigned permissions directly. Adding permissions from the start helps to build a system with clear access control, which your users can adapt to their use case. Role-based access control (RBAC) is one of the most widely-used approaches to manage permissions for big organizations, so customers with multiple users will thank you.
The API should enforce permissions and follow a consistent implementation for throwing errors when permissions are missing or preventing resources the entity may not access from being exposed.
Depending on your product, you may allow users to create resources like projects, send messages to other users, upload media like images or videos, or perform other tasks that your service offers. To make it easier to predict how your service will scale, you should create limits or quotas for most interactions, capping the upper bound of feature usage.
You might connect service quotas to subscription tiers or plans to allow more expensive plans to use your service more extensively while strictly limiting usage for lower tiers. This is extremely important to prevent unexpected performance issues when one customer impacts the experience for everyone else, especially in the long term.
Defining service quotas allows you to communicate technical and product limits clearly, negotiate with enterprise customers, and manage your costs more easily, as you can create realistic forecasts.
Similar to service quotas, we need another set of measures to prevent one user from consuming too many resources, degrading the experience of other API consumers. A classic approach is to limit the requests a user may send to your infrastructure in a given period, for example, 5 requests per second.
There are multiple ways to implement rate-limiting, from deciding how to calculate the limit and actual usage, for example, using a leaky bucket approach where users fill a virtual bucket until it is full and further requests are rejected, while simultaneously decreasing ("leaking") bucket contents until it is empty.
When using GraphQL, you may also want to calculate query complexity, to prevent deeply-nested and other complex queries from degrading the service performance. Given a GraphQL operation document, you can calculate its complexity based on a set of rules assigning cost values to specific attributes (e.g. scalar fields vs. ones with selection sets, number of elements expected in paginated request, etc.).
When implementing complexity calculation, you should measure its performance by comparing it with real complexity and load on the system. You might need to increase the cost when complex queries degrade performance or decrease the cost when too many queries are blocked. Similarly, you might need to tweak your calculation when real complexity differs too much.
After talking about limits and complexity, we continue with the theme of making your customers' usage predictable to know when to scale or move customers to dedicated environments. When you expose large data sets, for example, lists of resources, you should make sure never to return the full list at once, if you're dealing with dynamically generated or unlimited data sets. This could be the case for user-generated data or logs.
Even smaller lists can become quite expensive when using GraphQL, as your users might create complex queries for the list entries' contents. For this, you should check out how to calculate and enforce query complexity limits.
As a rule of thumb, you should always limit the number of returned items to a realistic amount you expect to handle without issues.
When you add a limit, you need a form of pagination to return the subsequent entries as well: Usually, people implement offset-based pagination rather than cursor-based pagination, which might be easier to get started with, but shows performance issues on large data sets when using database systems such as PostgreSQL, as they have to load all rows first, then apply the offset and limit, rather than filtering down early, making use of indices.
For these reasons, when dealing with relational databases, you can make use of cursor-based pagination. A cursor is a small piece of data that tells your database where to resume loading data from in subsequent requests. Usually, you can use your primary key as a cursor, and order by it at the same time. This guarantees consistency when inserting or deleting rows in between requests, as subsequent requests will pick up items where we left off in previous ones.
Using cursors does not prevent you from creating multi-attribute ordering, and if you do not use unique attributes, you can still fall back on the primary key, making sure you always have a clear tiebreaker.
Stripe offers cursor-based pagination for fetching data in bulk, using
ending_before cursors with a
When you offer a service that exposes critical functionality such as charging customers, confirming or placing orders, or performing any action that should run only once irrespective of network connection issues, so retry requests do not accidentally cause conflicts.
Idempotency can be implemented by storing a user-generated request key, optionally recording the internal response, and always returning this response for subsequent requests instead of performing the same action again. You can of course communicate that idempotency keys might expire after a given period, reducing the data you are required to store in the long term.
Stripe offers idempotent requests by allowing users to supply an
Idempotency-Key header, which will make sure requests using the same value will only end up in one operation, the result of which is stored and returned for every subsequent call.
Shopify offers idempotent requests as well, allowing users to supply a unique token or idempotency key.
When running your service in multiple environments or generating different token kinds for your customers to provide in their API requests, it can be helpful to prefix tokens with an identifier providing information on the type of the token (personal access token, secret token, etc.) and service environment (service name + staging/production/etc.), for example,
ghp_example (GitHub personal access token) or
sk_live_example (Stripe secret key in live mode).
Not only is this useful for debugging, it can also help with detecting accidentally-published secret keys, as GitHub outlined in a recent post on their new token format.
For investigating failed requests, it is important to identify which request to look at, for example, when a customer reports an error they received. This can be done by generating a unique request identifier, which is not only returned in the response but also used for monitoring and observability tooling and tracing.
When a user now files a support ticket including the request ID (make sure to publicly document the process of obtaining this and make it as easy possible to provide in communications with customer support), your team can immediately retrieve details such as the status code, detailed traces or logs, and even cross-service traces when using distributed tracing.
Last but not least, testing. This time we're not talking about testing your own service but making it easier for your customers to test yours. A common drawback of using external services is often that testing different cases without complex setups is nearly impossible, but this does not have to be the case.
Adding ways for your users to test their integration with your service will lead to fewer issues arising from unhandled cases or wrong assumptions and miscommunication.
Stripe provides a test mode and allows developers to provide specific testing card numbers and other tokens to invoke special cases such as succeeding or failing payments, disputes, and other lifecycle steps that need to be handled.
I hope you enjoyed this comprehensive overview of API features and architectural decisions which can help make your API stand the test of time, with predictable performance, satisfied customers, and fewer fires to fight, and long nights to spend on incidents.