Oct 30, 2020

When (not) to use pg_dump

When working with PostgreSQL databases, pg_dump is a fantastic tool for exporting the current state of a database and restoring it, either using pg_restore or any other database client. You can specify specific schemas and tables to export, include or exclude content, and configure a lot of other settings as well. And, theoretically, it should work on live systems, as it doesn't block any ongoing read/write operations.

So far, so good. After a couple of months running it in production, it started to seem that pg_dump would become slower on growing databases, which seemed odd, given the choice of exporting only specified schemas. So I took some time and dug into its internals, finding some interesting details about how it works.

🐘 Setting up a debugging environment

To get a deeper look into how pg_dump operates, I cloned the source code and set up everything needed for building it

$ git clone git://git.postgresql.org/git/postgresql.git
$ cd postgresql
$ ./configure

The configure script will automatically set up all Makefiles needed for compiling Postgres from source.

To build pg_dump, head over to the subdirectory and run make as follows

$ cd src/bin/pg_dump
$ make pg_dump
$ ./pg_dump --help

🚨 Unfiltered SELECTs

First, I added some simple logging statements to understand how long pg_dump spent on specific parts, as well as how many resources were processed. In pg_dump.c#getSchemaData, we can see several read operations, retrieving namespaces, tables, and other database resources to be exported.

If we take a closer look into those operations, we'll quickly notice that most operations aren't limited to the given schemas, but instead load all resources present in the database, e.g.

Loading namespaces

 * getNamespaces:
 *	  read all namespaces in the system catalogs and return them in the
 * NamespaceInfo* structure
 *	numNamespaces is set to the number of namespaces read in
NamespaceInfo *
getNamespaces(Archive *fout, int *numNamespaces)

pg_dump will load all namespaces so objects retrieved later on can be linked to their parent namespace. For systems with large numbers of schemas, this will scale linearly to the schema count. While it's a simple operation, retrieving all namespaces in a database with 10k+ schemas took a couple of seconds.

Loading user-defined tables

 * getTables
 *	  read all the tables (no indexes)
 * in the system catalogs return them in the TableInfo* structure
 * numTables is set to the number of tables read in
TableInfo *
getTables(Archive *fout, int *numTables)

Similar to getNamespaces, getTables will load all user-defined tables and table-like objects in the database. This is necessary for identifying inherited columns, owned sequences, and other objects later on. Loading all ~150k tables took around 10s.

Loading dependencies

 * getDependencies --- obtain available dependency data
static void
getDependencies(Archive *fout)

Reading dependency data was the operation that took up most of the time, by far. While the actual query execution was around 3 seconds, fetching the data (effectively millions of rows) took up almost 30 seconds.

All operations outlined above will run every time you invoke pg_dump, whether it's targeted at a single schema or hundreds. Unfortunately, some of the steps that a required for retrieving all objects can take up considerable amounts of time, making time-critical use cases less feasible.

Of course, I'm a complete outsider when it comes to the architectural decisions taken when building and improving pg_dump over the years, but I'd be curious if there were attempts to limit what needs to be fetched, especially in database environments with many schemas. It seems like pg_dump wasn't optimized for these cases, which can make for hard times.

Reading through mailing list threads from a couple of years ago had me stumbling over similar case descriptions, where a high number of schemas resulted in decreased performance, which was fixed at the time. It seems there might be more to it still. It's also quite fun to see that some parts of the code were touched 17 years ago, which underlines the scope of this project.

I'm not mad at why pg_dump works as it does, because it does a great job at what it promises to do. Using it for time-critical workloads just doesn't match the design, and that's fine.