Aug 17, 2019

JavaScript Generators And What They're Used For

Unlike most JavaScript language features introduced throughout the last years, Generators always felt weird to me from an end-user perspective. Sure, having a function that "can be exited and later re-entered" sounds neat and all, but what would you actually utilize that for in typical web application use cases?

I've heard that Babel, Webpack, and other libraries were relying on this feature heavily under the hood, but I never had that "oh, that's where I should use a generator!" moment, that is, until I recently encountered an interesting situation.

Let's first reiterate (hope you see what I did there after you've read the next paragraphs) the basics about generators and iterators in modern JavaScript.

🔬 What's a generator?

As already teased above, generators are special functions that can be re-entered after exiting using the keyword yield, introduced in ES2015/ES6. Defined with a star suffix (e.g function* doSomething), they can be called once to return an iterator object, which is consumed manually or using the new for await loop syntax. Once no more content should be returned, generator functions can "exit" using the regular return keyword. This will transition the underlying iterator to a done state.

If you want to learn more about generators, please head over to MDN

💡 And what about iterators?

Per definition, an iterator is "an object which defines a sequence and potentially a return value upon its termination". This basically means that iterators' contents can be consumed infinitely until they explicitly transition to a done state, after which no more values are to be returned. Since it's tedious to manually build iterators, we will leverage generator functions, created for that exact use case.

If you're interested in iterators and how they can be used, head over to MDN

🤔 What's a realistic use case though?

Most examples I've looked through so far will simply increment and return values, so not that much. But as it turns out, generators are amazing for repeatedly accessing data sources instead of loading data in one big batch. Building streaming systems which will fetch as long as there's new content available, or splitting records in a database table into segments of limited size.

Our example will be to fetch records from a data store in small batches to avoid loading the complete data set into memory and thus impacting the service performance to a critical degree.

For small applications, you wouldn't think about those things but once usage increases, you'll inevitably face these problems and should care about designing your systems more carefully since one database request could bring down your entire backend infrastructure, which is why we'll tackle this problem right at the beginning! To make it clear and simple, pagination is important!

// Limit post batches to 25 at a time
const POST_BATCH_LIMIT = 25;

// Declare our async generator function "batchPosts"
async function* batchPosts(userId: string, client: Client) {
  let skip = 0;

  while (true) {
    // We load a list of posts, limited
    // by the skip & limit parameters
    const posts = await client.loadPosts(userId, {
      skip,
      limit: POST_BATCH_LIMIT
    });

    // In case our database doesn't
    // return additional results, exit the loop
    if (posts.length < 1) {
      return;
    }

    // Increase the skip count
    skip += POST_BATCH_LIMIT;

    // And return the loaded post values!
    yield posts;
  }
}

export async function processPosts(
  userId: string,
  connect: () => Promise<Client>
) {
  // This is just a mock function
  // to connect to our data store _once_
  const client = await connect();

  // We have to "initialize" our iterator by
  // calling the batchPosts generator function
  const postsIterator = batchPosts(userId);

  // The for await will loop over yielded values
  // as long as the postsIterator is not done,
  // so until we call the "return"
  for await (const posts of postsIterator) {
    // Do something with the loaded posts!
  }
}

And for the interested ones, this is our mock database implementation. Just for reference though.

// Mock database object, this will probably not work 🙃
const Database = {
  async connect() => {
    // Do some connect magic
  },
  async loadPosts(userId: string, { skip, limit }: IPaginate) {
    // Build query here

    if (typeof skip === "number") {
      // Add OFFSET to query in SQL environments
    }

    if (typeof limit === "number") {
      // Add LIMIT to query in SQL environments
    }

    // Add some execution logic here
  }
}

Although it might not look like much at first, we've built pretty powerful pagination into our database queries for processing posts, and all of that without much difficulty. In some rare cases, it's not desirable to use pagination, but as a best practice, we should restrict our business logic in this way, especially when it's that easy.

For the biggest part, we don't even have to build these features ourselves, most storage drivers and libraries provide some sort of pagination or cursor logic! But when you're out of luck and think about implementing similar tasks, don't hesitate and try out generators!

And it doesn't even stop here, we could even use keywords to manage the control flow of our for await loop, for example, breaking or returning to cancel loading any further results.

Hopefully this post helped you to get some ideas for using (async) generators and iterators, maybe you're already using them whenever possible! Either way, let me know on Twitter or by sending a mail 👍