Downloading Who's On First

State

At rest every Who's On First record is an atomic GeoJSON-encoded text file. This decision addresses concerns about the portability and longevity of the project but can make it a challenge to get started. For that reason we provide a variety of pre-packaged distributions, in a number of formats, of Who's On First data organized by placetype and country.

Distributions

Distributions of data produced by the Who's On First project are sponsored by Geocode Earth (thank you!). Distributions include full planet and by-country download options in several formats.

https://geocode.earth/data/whosonfirst

This data is made available under the terms of the Who's On First License.

In order to talk about distributions we need to take a moment to talk about how the raw data is stored and organized.

If you haven't already read the How the Who's On First data repositories are organized section below, now would be a good time to take a look because its informs how we are currently building the distributions listed here. Go ahead, we'll wait...

NOTE: All distributions are derived from individual per-country repositories in the whosonfirst-data GitHub organization.

Download SQLite databases

Who's On First data exported as a set of normalized SQLite database tables: ancestors, concordances, geojson, geometries, names, spr (which is an acronym for "standard places response").

Download Shapefiles

Shapefile format includes most SPR properties and additional denormalized properties from the other SQLite tables.

Download "Bundles"

"Bundles" are basically Who's On First Git repositories without the .git directories (because they often very large and of little use to people who just want to work with the raw data).

Raw GeoJSON files in GitHub repositories

All the Who's On First data and its complete edit history is available on GitHub:

How the Who's On First data repositories are organized

We use the Git version control system to manage Who's On First data. One of the present-day limits of Git is the number of atomic files you can store in a single Git "repository". We believe that eventually it will be possible to keep all 26 million records in a single Git repository but we are not there yet.

Instead of a single monolithic repository we have grouped Who's On First data as follows:

  • Administrative Data - all administrative placetypes (all the places between and inclusive of continents to microhoods) for the entire world
  • Everything else - all other placetypes including venues, postalcodes, constituencies and intersections

The naming convention for Who's On First data reposiories at its most granular looks like this:

	whosonfirst-data + "-" + WHOSONFIRST_PLACETYPE + "-" + WHOSONFIRST_COUNTRY + "-" + WHOSONFIRST_SUBDIVISION

There are some important points to keep in mind about these conventions:

  • All administrative data is stored in the whosonfirst-data repository. Administrative data is not subdivided by country or placetype.
  • We try to use the shortest name whenever possible. The only reason for subdiving a placetype by country (or a country by subdivision) is to address operational concerns of working with the data in Git or GitHub.
  • As a general rule every Who's On First document should have both an explicit wof:repo property containing the name of its parent repository.
  • Who's On First documents should also contain all the necessary properties to reconstruct its wof:repo name allowing developers to validate that name by testing for the presence of a matching repository, starting with the most granular name and working backwards.
  • The Who's On First data model allows for all of these repositories to be merged in to a single tree stucture without any collisions. If a Who's On First record is accidentally stored in or saved to an inappropriate repository that is considered an inconvenience (to be fixed) but not an error.

Administrative Data

Administrative data is located in the whosonfirst-data repository. This repository contains all administrative placetypes (all the places between and inclusive of continents to microhoods) for the entire world.

You can find data for the following placetypes in the whosonfirst-data repository: continent, empire, country, macroregion, region, macrocounty, county, locality, macrohood, neighbourhood, microhood

Other Data

Venues

There are over 20 million venues in Who's On First, with about 60% in the USA. Venues in the USA are grouped in to whosonfirst-data-venue-us-{WHOSONFIRST_SUBDIVISION} repositories, while everything is grouped in to whosonfirst-data-venue-{WHOSONFIRST_COUNTRY} repositories.

There is also a general purpose whosonfirst-data-venue repository which contains no data but pointers to all the venue-related repositories that do:

Postal Codes

Postal codes are grouped in to whosonfirst-data-postalcode-{WHOSONFIRST_COUNTRY} repositories.

There is also a general purpose whosonfirst-data-postalcode repository which contains no data but pointers to all the postal code -related repositories that do:

Constituencies

Constituencies are available for only a select number of countries (two to be exact: the USA and Canada) as we work through what it means to include constituencies in Who's On First. If you have constituencies from other countries we'd love to include them too.

Intersections

Intersections are a still-experimental placetype in Who's On First, currently only available for New York City (and specifically Manhattan).