Nov 15, 2020

Scraping Country Codes

For a recent project, I needed a list of all countries, their country codes, and an emoji for their flag.

Two-letter alphanumeric country codes (following ISO 3166-1) are a common identifier for country names. Unfortunately, to download the official data set as CSV, you'll have to pay $330 first. Luckily, all country codes are listed on the Wikipedia page.

So naturally, I attempted to extract all data that I needed from Wikipedia, all from within the browser.

let tableRows = Array.from(
  // Get all table rows
  document.querySelectorAll('table.wikitable:nth-child(23) > tbody > tr')
);
// -> [tr, tr, tr, tr, ...]

In the browser console, we can easily transform the table to retrieve all information we need, starting from the table rows, to accessing their contents (country code, which is the first element, and country name, which is the second)

tableRows
  .map(e => {
    const countryCode = e.children[0].getAttribute('id');

    let nameEl = e.children[1].children[0];
    if (nameEl.nodeName !== 'A') {
      nameEl = nameEl.children[0];
    }

    return [countryCode, nameEl.getAttribute('title')];
  })
  .sort((a, b) => a[1].localeCompare(b[1]));
// -> [["DE","Germany"], ...]

Now that we've collected country codes and names, we're only missing emojis.

For this, I took node-emoji's list of emoji JSON, which I could parse and use for retrieving the flags.

// Parse the JSON with all emojis
let allEmojis = JSON.parse(`{"flag-de": "🇩🇪"}`);

Now that our emojis are in scope, let's update our previous scraping logic to include the country's flag emoji in the final array.

tableRows
  .map(e => {
    const countryCode = e.children[0].getAttribute('id');

    let nameEl = e.children[1].children[0];
    if (nameEl.nodeName !== 'A') {
      nameEl = nameEl.children[0];
    }

    return [
      countryCode,
      nameEl.getAttribute('title'),
      allEmojis[`flag-${countryCode.toLowerCase()}`]
    ];
  })
  .sort((a, b) => a[1].localeCompare(b[1]));
// -> [["DE","Germany", "🇩🇪"], ...]

This solution is incredibly simple, and it gets the job done. With this, you can generate whatever you want out of it, another JSON you'll embed in your project, code to be used by your application, the possibilities are endless 👍