Aug 08, 2021

Setting up Algolia DocSearch with Next.js

When there's a lot of content to go through, finding what you are looking for can become tedious. It's no surprise that most users will just bounce when they cannot find a resource in a certain amount of time.

While structuring your pages can already help to get to the right section faster, we can do better. Adding search capabilities to all pages enables your users to find anything they're interested in, almost immediately.

For this purpose, Algolia created DocSearch, which they claim to be the best search experience for docs. Integrating DocSearch requires two things: Scraping your live content and adding the search UI.

For the first step, if you are building an open source project, Algolia will scrape your documentation every 24 hours, once you have joined the program. If you are building documentation for your commercial product, don't worry, you can host the DocSearch scraper yourself.

Adding the search interface into your site is the second step, and quite trivial once you're set up. In this guide, we'll be integrating DocSearch into our Next.js site, and scraping documentation after builds, using GitHub actions.

Setting up Algolia

Before we get started, you need to set up an account with Algolia and create an application.

Once that is done, we'll create a new index, for this guide I'll call it docs. To access it, we need three API keys. One for scraping, and two for searching in development and production environments.

For scraping, create an API key scoped to the indices docs and docs_tmp, the latter of which is used by the DocSearch scraper while it is in progress, and add the ACL operations addObject, editSettings, and deleteIndex. These are required for DocSearch to perform all operations.

For your site, add two API keys for both development and production. While both require the search ACL operation, we can scope the key used in production to our domain using the HTTP Referer option, so your key is not used on other sites than your own. You can also limit the max. allowed API calls per IP per hour, if you want to make sure that nobody uses up your limit of 10k search requests per month.

Your development key can be configured to match your local Next.js location or pull request previews.

Copy all keys and your application ID to a place for the next steps.

Scraping your docs

Now that our index is all set up, we can start scraping our hosted documentation. As we want to run it in CI, new content must be built and hosted before we start scraping, otherwise, we'd be lagging behind.

To configure our scraper, we'll create a docsearch.json file in our repository.

{
  "index_name": "docs",
  "start_urls": [
    "https://<Your Domain>/docs"
  ],
  "sitemap_urls": [
    "https://<Your Domain>/sitemap.xml"
  ],
  "selectors": {
    "lvl0": ".page-heading",
    "lvl1": "h1",
    "lvl2": "h2",
    "lvl3": "h3",
    "lvl4": "h4",
    "lvl5": "h5",
    "text": "p"
  }
}

The index name specifies the location where your scraped documentation content will be stored, ready to search. This needs to match the index we created earlier.

Start URLs tell DocSearch where to begin searching, and you can have multiple ones.

While the sitemap URL is optional, it helps DocSearch to find your pages much more reliably than if it has to go through it by itself.

Selectors assign a hierarchy for search, so different levels (headings) are displayed separately from others (regular paragraphs). This should match your content structure.

Note that all content must server-rendered so that the scraper can fetch it immediately. If your setup does not allow for this, add "js_render": true to your config below. This will start up a Selenium proxy to fetch everything, and will naturally take more resources and time to complete.

Now that our scraper is configured, we can run it! Make sure your environment contains the application ID and API key variables as defined below

export ALGOLIA_APP_ID=<Your App ID>
export ALGOLIA_API_KEY=<Your scraping API key>

Once set, we can run the official Docker image Algolia provides for DocSearch. Note that this will require to have Docker, cat, and jq installed.

docker run \
	--env ALGOLIA_APP_ID=${ALGOLIA_APP_ID} \
	--env ALGOLIA_API_KEY=${ALGOLIA_API_KEY} \
	--env "CONFIG=$(cat docsearch.json | jq -r tostring)" \
	algolia/docsearch-scraper

You should see logs of the successful scraping appear after a couple of seconds. If your configuration is invalid, it will return a cryptic error telling you what's wrong.

Scraping after every deployment

Whenever we update our docs, we want to scrape the contents so our search experience is always accurate.

We'll set up a GitHub action to run either manually or whenever our docs change. If you have any prior build step or want to attach it to an existing action containing one, feel free!

name: Update DocSearch
on:
  # Allow to run manually
  workflow_dispatch:
  push:
    paths:
      # Re-run the action if we change the workflow
      - '.github/workflows/update-docsearch.yaml'

      # This needs to match the location where
      # you store your documentation code and content
      - 'docs/**'
    branches:
      - main
jobs:
  update-docsearch:
    name: Update DocSearch
    runs-on: ubuntu-20.04
    steps:
      # Check out latest code
      - name: Checkout
        uses: actions/checkout@v2
      - name: Update DocSearch
        # If your docs are in the root, omit the line below
        working-directory: docs
				env:
		      ALGOLIA_APP_ID: ${{ secrets.DocSearchAppId }}
		      ALGOLIA_API_KEY: ${{ secrets.DocSearchApiKey }}
        run: |
          docker run \
            --env ALGOLIA_APP_ID=${ALGOLIA_APP_ID} \
            --env ALGOLIA_API_KEY=${ALGOLIA_API_KEY} \
            --env "CONFIG=$(cat docsearch.json | jq -r tostring)" \
            algolia/docsearch-scraper

As you can see, we run the same command as earlier to start up the DocSearch scraper, but this time, all environment variables are passed in from secrets (DocSearchAppId and DocSearchApiKey), which you'll have to create upfront.

Integrating DocSearch

Now that our documentation is kept in sync, we need to add our search UI. Algolia provides a number of packages for DocSearch, we'll install @docsearch/react.

To import all search-related styles to our custom Next.js app, we can add

import '@docsearch/css';

This needs to be done in the custom app file.

Once that is done, we can set up our Search component

import { DocSearch } from '@docsearch/react';
import Head from 'next/head';

export const SEARCH_API_KEY = process.env.NEXT_PUBLIC_ALGOLIA_KEY;
export const SEARCH_APP_ID = process.env.NEXT_PUBLIC_ALGOLIA_APP_ID;

export function Search() {
  if (!SEARCH_APP_ID || !SEARCH_API_KEY) {
    return null;
  }

  return (
    <>
      <Head>
        <link
          rel="preconnect"
          href={`https://${SEARCH_APP_ID}-dsn.algolia.net`}
          crossOrigin="anonymous"
        />
      </Head>
      <DocSearch
        apiKey={SEARCH_API_KEY}
        indexName={'docs'}
        appId={SEARCH_APP_ID}
      />
    </>
  );
}

We simply use the DocSearch component provided by Algolia, and optionally pre-connect to the host we'll use to speed up the first search interaction.

For this to work, we need to define the following environment variables in our env files

NEXT_PUBLIC_ALGOLIA_KEY=<The public API for the current env>
NEXT_PUBLIC_ALGOLIA_APP_ID=<App ID>

Once you have defined those and restarted Next.js, you can add the search component wherever you like and start searching! Make sure you use the production key for production deployments, though!


There's a lot more you can do, from styling DocSearch and changing how the component behaves in general, to handling different locales and versions with facetFilters.