Downloading Who's On First
At rest every Who's On First record is an atomic GeoJSON-encoded text file. This decision addresses concerns about the portability and longevity of the project but can make it a challenge to get started. For that reason we provide a variety of pre-packaged distributions, in a number of formats, of Who's On First data organized by placetype and country.
Distributions
Distributions of data produced by the Who's On First project are sponsored by Geocode Earth (thank you!). Distributions include full planet and by-country download options in several formats.
https://geocode.earth/data/whosonfirst
This data is made available under the terms of the Who's On First License.
In order to talk about distributions we need to take a moment to talk about how the raw data is stored and organized.
If you haven't already read the How the Who's On First data repositories are organized section below, now would be a good time to take a look because its informs how we are currently building the distributions listed here. Go ahead, we'll wait...
NOTE: All distributions are derived from individual per-country repositories in the whosonfirst-data GitHub organization.
Download SQLite databases
Who's On First data exported as a set of normalized SQLite database tables: ancestors, concordances, geojson, geometries, names, spr (which is an acronym for "standard places response").
Download Shapefiles
Shapefile format includes most SPR properties and additional denormalized properties from the other SQLite tables.
Download "Bundles"
"Bundles" are basically Who's On First Git repositories without the .git
directories (because they often very large and of little use to people who just want to work with the raw data).
Raw GeoJSON files in GitHub repositories
All the Who's On First data and its complete edit history is available on GitHub:
How the Who's On First data repositories are organized
We use the Git version control system to manage Who's On First data. One of the present-day limits of Git is the number of atomic files you can store in a single Git "repository". We believe that eventually it will be possible to keep all 26 million records in a single Git repository but we are not there yet.
Instead of a single monolithic repository we have grouped Who's On First data as follows:
- Administrative Data - all administrative placetypes (all the places between and inclusive of continents to microhoods) for the entire world
- Everything else - all other placetypes including venues, postalcodes, constituencies and intersections
The naming convention for Who's On First data reposiories at its most granular looks like this:
whosonfirst-data + "-" + WHOSONFIRST_PLACETYPE + "-" + WHOSONFIRST_COUNTRY + "-" + WHOSONFIRST_SUBDIVISION
There are some important points to keep in mind about these conventions:
- All administrative data is stored in the
whosonfirst-data
repository. Administrative data is not subdivided by country or placetype. - We try to use the shortest name whenever possible. The only reason for subdiving a placetype by country (or a country by subdivision) is to address operational concerns of working with the data in Git or GitHub.
- As a general rule every Who's On First document should have both an explicit
wof:repo
property containing the name of its parent repository. - Who's On First documents should also contain all the necessary properties to reconstruct its
wof:repo
name allowing developers to validate that name by testing for the presence of a matching repository, starting with the most granular name and working backwards. - The Who's On First data model allows for all of these repositories to be merged in to a single tree stucture without any collisions. If a Who's On First record is accidentally stored in or saved to an inappropriate repository that is considered an inconvenience (to be fixed) but not an error.
Administrative Data
Administrative data is located in the whosonfirst-data repository. This repository contains all administrative placetypes (all the places between and inclusive of continents to microhoods) for the entire world.
You can find data for the following placetypes in the whosonfirst-data repository: continent, empire, country, macroregion, region, macrocounty, county, locality, macrohood, neighbourhood, microhood
Other Data
Venues
There are over 20 million venues in Who's On First, with about 60% in the USA. Venues in the USA are grouped in to whosonfirst-data-venue-us-{WHOSONFIRST_SUBDIVISION}
repositories, while everything is grouped in to whosonfirst-data-venue-{WHOSONFIRST_COUNTRY}
repositories.
There is also a general purpose whosonfirst-data-venue
repository which contains no data but pointers to all the venue-related repositories that do:
Postal Codes
Postal codes are grouped in to whosonfirst-data-postalcode-{WHOSONFIRST_COUNTRY}
repositories.
There is also a general purpose whosonfirst-data-postalcode
repository which contains no data but pointers to all the postal code -related repositories that do:
Constituencies
Constituencies are available for only a select number of countries (two to be exact: the USA and Canada) as we work through what it means to include constituencies in Who's On First. If you have constituencies from other countries we'd love to include them too.
- https://github.com/whosonfirst-data/whosonfirst-data-constituency-ca
- https://github.com/whosonfirst-data/whosonfirst-data-constituency-us
Intersections
Intersections are a still-experimental placetype in Who's On First, currently only available for New York City (and specifically Manhattan).