This post is the second part of a series on exploring the benefits and drawbacks of engineering for the application and database layer. If you haven’t done so, check out the first post on reducing database roundtrips.
When you’re working with databases, you may be inclined to offload a bigger part of your application code to the database. This approach can come in many forms, from larger queries including data transformation, aggregation, and summarization, to multi-query statements sent in a batch and evaluated outside of the application.
When we built the underlying infrastructure for serving content at Hygraph, we decided that we wanted to minimize network roundtrips and have our PostgreSQL databases carry most of the load. The background for this design decision was that we grew very confident that PostgreSQL was mature and performant enough to handle requests almost completely and saw the downsides of sending endless requests over the network. This decision influenced the paradigms and patterns we chose for the related service in multiple ways, moving away from the traditional application.
The traditional service-database relationship
Usually, when you build an application that uses a database for persistence, the application still remains in control. The whole execution flow is powered by it, branches are evaluated within the application and the database is only used for retrieving or writing data.
In this scenario, you will send multiple network requests back and forth, as you’ll have to fetch both the context needed to make branching decisions, as well as the data required for building a response.
Debugging an application like this should be pretty straightforward, starting up a local database and attaching a debugger to the application isn’t a novelty anymore.
When the application decides what happens, you can vary the compute resources consumed by your databases simply by deciding which operations you want to run in the database.
Zero-knowledge approach
Moving away from this approach, we can choose to move more context into the database instead. This essentially turns the application into a stateless request handler that prepares queries for the database to run, sends them in a batch, and receives individual results (and errors).
With this, we can leverage the database for more complex operations close to our data. To perform branching operations, we require some control structures within the database.
Imagine you were to move permission checks or an upsert operation to the database. You’d need to add filter statements to most queries, or create branches depending on a condition (if something exists, then all subsequent statements will need that condition, if it doesn’t, we need to execute a different path).
Eventually, you’ll rebuild a regular runtime environment, just using database structures. Local variables become rows in a temporary table, and blocks of statements are represented by statements with conditions.
By now, you’ve probably realized that the more we move away from code and toward database statements, the harder it becomes to reason about and debug our application. You can’t just fake the database, you’ll need the real thing, and the only feedback you get is the query result. No breakpoints, just rows.
Once your product grows, you might also want to add additional services like search providers, other databases, caches, and so on. If everything depends on one database, that’s mostly impossible.
At Hygraph, we incrementally moved more logic into the database and, with more features and operational knowledge, recently started planning architectural changes to gradually decouple the application from the database again, running more small statements and making decisions locally rather than sending one big batch and seeing what happens.
As always, decisions are made in a spectrum where one end is a completely-decoupled and generic system while the other extreme is running everything in one system.
There are always trade-offs to be made, and in the end, that’s what software engineering is about.