Jan 16, 2022

Indexing Pages Programmatically Using the Indexing API

When you manage websites with more than a dozen pages, it can get hard to make sure search engines like Google know when to index newly-added content. Some pages may change frequently, and the classic way of publishing a page, then waiting a couple of days to see if Google picked it up might not be feasible.

The Indexing API allows notifying Google about updates to pages, or when pages are removed completely. Technically, it can only be used for pages including JobPostings or Live video, but it does work for other pages, so there’s that.

Notifying Google about an update to a page (or a newly-added page) is as simple as sending a POST request to https://indexing.googleapis.com/v3/urlNotifications:publish

{
  "url": "https://careers.google.com/jobs/google/technical-writer",
  "type": "URL_UPDATED"
}

Doing this requires setting up a service account linked to your Search Console property, a process which I’ll walk through in the next section. Once this is set up, we can notify Google to index all pages we know of using the sitemap.

Accessing the Indexing API

First, we’ll need to enable the Indexing API for our project. If you haven’t created a project yet, you can do so now.

Next, we need to create a service account that can access the Indexing API as well as the Search Console.

For this, head over to this page (select your project if it wasn’t selected automatically), and start by entering a name. Then continue, skip the permissions page, and click Done. When you’re back in the service account list, head to the newly-created service account, copy its email in the Details tab, then go to the Keys and click Add Key, then Create new key. Select JSON, and download the file.

Up next, we need to add our service account to the property we want to index. For this, head to your search console, then go to Settings, Users and Permissions, and click Add user. Paste the email of the service account we just created, then select Owner as the role.

That’s it! You just enabled the Indexing API, added a service account that can access it, and granted access to your property, so the indexing will work. We’ll use the JSON key file in the next step!

Indexing all pages

For this, we’ll use a short Node.js script that requires the following dependencies

npm i node-fetch googleapis

We’ll also provide the JSON key of our service account (truncate this to one line) via the INDEXER_SERVICE_ACCOUNT_KEY environment variable.

const fetch = require('node-fetch');
const { google } = require('googleapis');
const key = JSON.parse(process.env.INDEXER_SERVICE_ACCOUNT_KEY);
const sitemapUrl = '<your domain>/sitemap.xml';

async function main() {
  const sitemap = await fetch(sitemapUrl);
  if (!sitemap.ok) {
    console.error('Could not fetch sitemap');
    console.error(await sitemap.text());
    process.exit(1);
  }
  const sitemapXml = await sitemap.text();
  const sitemapEntries = sitemapXml.match(/<loc>(.*?)<\/loc>/g);

  console.log(`Preparing to index ${sitemapEntries.length} pages`);

  const jwtClient = new google.auth.JWT(
    key.client_email,
    null,
    key.private_key,
    ['https://www.googleapis.com/auth/indexing'],
    null
  );

  const tokens = await jwtClient.authorize();

  for (const entry of sitemapEntries) {
    const urlToIndex = entry.replace('<loc>', '').replace('</loc>', '');

    console.log(`Indexing ${urlToIndex}`);

    const res = await fetch(
      'https://indexing.googleapis.com/v3/urlNotifications:publish',
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: 'Bearer ' + tokens.access_token
        },
        body: JSON.stringify({
          url: urlToIndex,
          type: 'URL_UPDATED'
        })
      }
    );
    if (!res.ok) {
      console.log(`Failed to index ${urlToIndex}`);
      console.error(await res.text());
    }
  }
}

main();

Automating the Indexing Process

To notify Google to index pages whenever we push new code to our site (or manually), we can set up a GitHub Actions workflow. Before we add this, make sure to create the INDEXER_SERVICE_ACCOUNT_KEY secret with the truncated JSON key contents we used earlier.

name: Index Pages
on:
  workflow_dispatch:
  push:
    branches:
      - main
jobs:
  index:
    name: Index Pages
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v2
        with:
          node-version: '14'
      - run: npm install
      - run: node index.js
        env:
          INDEXER_SERVICE_ACCOUNT_KEY: ${{ secrets.INDEXER_SERVICE_ACCOUNT_KEY }}

Saving this script in .github/workflows/index.yaml in your repository will kick off a run whenever you push new files. You can also ignore paths you don’t care about, so the job runs only when the actual content is changed.

Service Quotas

It’s important to understand the default service quotas put in place, as you will probably run into those when running the script too often (or for too many pages). By default, Google allows you to send 200 publish requests/day, with 600 requests/minute. The limiting factor will most likely be the 200 publish instructions, so it might be better to publish only the pages that changed.


That’s everything you need to index your pages, much faster than going through every page by hand, and completely independent of your technical setup. Depending on the number of pages and frequency of running the script, you might want to include only pages you really updated, so you don’t hit the rate limits. Another option is to raise the service quotas if you expect to hit the limit frequently.