Shapefiles
Introduced in 2023, shapefile downloads for the Who’s On First gazetteer are available as per-country ZIP archives including admin (with country, region, county, locality, neighbourhood & more placetypes), postalcode, and constituency bundles.
The Shapefile distribution data properties are a superset of the minimum set of standardized place response (SPR) properties adapted to the limitations of the popular format's geometry type and DBF short column name lengths, with select commonly used properties from the SQLite distribution added and denormalized into a single DBF table.
type ShapefileSchema interface { id() integer parent_id() integer name() string placetype() string country() string repo() string lat() float lon() float min_lat() float min_lon() float max_lat() float max_lon() float modified() date name_ara() string name_ben() string name_deu() string name_eng() string name_ell() string name_fas() string name_fra() string name_heb() string name_hin() string name_hun() string name_ind() string name_ita() string name_jpn() string name_kor() string name_nld() string name_pol() string name_por() string name_rus() string name_spa() string name_swe() string name_tur() string name_ukr() string name_urd() string name_vie() string name_zho() string gn_id() integer wd_id() integer concord_id() string concord_ke() string iso_code() string hasc_id() string country_id() integer region_id() integer county_id() integer population() integer placetype_local() string is_funky() integer min_zoom() float max_zoom() float min_label() float max_label() float geom_src() string }
Shapefile ZIP archive file layout
Since shapefiles don’t support mixed geometry types, there can be 2 shapefiles for every WOF placetype, like “locality-points” and “locality-polygons” that collect the individual file components (with extensions like: shp, shx, dbf, prj, and cpg) into a single compressed ZIP archive.
For example:
Would have two shapefiles for the locality placetype, one per geometry type:
- whosonfirst-data-admin-us-locality-polygon.shp
- whosonfirst-data-admin-us-locality-point.shp
Shapefile Data Schema
The shapefile format includes “standard place response” (or SPR) and additional properties. Because of Shapefile’s 10-character limitation on DBF field name length (which would truncate some column names in an ambiguous way) we rename them explicitly, as noted below in the “field” column, with the “field_full” indicating the full SPR field name in the other formats, or full WOF property name. Some shapefile properties can coalesce values from multiple WOF source properties.
Standard Place Response (SPR) fields
With Shapefile we export most but not all of the core SPR properties.
field | field_full | type | description |
id | Id | integer | The unique ID of the place |
parent_id | ParentId | integer | The unique parent ID of the place. Negative values indicate “complicated”. |
name | Name | string | The default name of the place (mostly English, mostly ASCII-7) |
placetype | Placetype | string | The Who’s On First placetype of the place |
country | Country | string | The two-letter country code of the place |
repo | Repo | string | The (Git) repository name where the source record for the place is stored. |
lat | Latitude | float | The latitude for the principal centroid (typically “label”) of the place |
lon | Longitude | float | The longitude for the principal centroid (typically “label”) of the place |
min_lat | MinLatitude | float | The minimum latitude of the bounding box of the place |
min_lon | MinLongitude | float | The minimum longitude of the bounding box of the place |
max_lat | MaxLatitude | float | The maximum latitude of the bounding box of the place |
max_lon | MaxLongitude | float | The maximum longitude of the bounding box of the place |
modified | LastModified | date | The Unix timestamp indicating when the place was last modified |
Names
To provide a more ergonomic Shapefile experience, the SPR is pre-joined to the names table and includes 25 localized names, when available, for:
Arabic, Bengali, Chinese (simplified and/or traditional), Dutch, English, Farsi, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese
Who’s On First uses the RFC 5646/ BCP-47 language indications for names to specify a 3-character code for the following preferred locales as name_{locale}
properties (so name_eng
for English).
The list of supported Shapefile languages is adapted from Natural Earth and Tilezen’s list of core languages. Arabic, Chinese, English, French, Russian and Spanish are used by the United Nations for meetings and official documents. The other languages listed are either proposed as an official language of the United Nations (Bengali, Hindi, Portuguese, and Turkish) or frequently used in OpenStreetMap, Who’s On First, or Wikipedia.
NOTE: Several hundred other languages are supported in the SQLite distributions in the “names” table.
Languages and their name fields
field | 3-char | 2-char | language | native script |
name_ara | ara | ar | Arabic | العربية |
name_ben | ben | bn | Bengali | বাংলা |
name_deu | deu | de | German | Deutsch |
name_eng | eng | en | English | English |
name_ell | ell | el | Greek | ελληνικά |
name_fas | fas | fa | Farsi | فارسی |
name_fra | fra | fr | French | français |
name_heb | heb | he | Hebrew | עִבְרִית |
name_hin | hin | hi | Hindi | हिन्दी |
name_hun | hun | hu | Hungarian | magyar |
name_ind | ind | id | Indonesian | Bahasa Indonesia |
name_ita | ita | it | Italian | italiano |
name_jpn | jpn | ja | Japanese | 日本語 |
name_kor | kor | ko | Korean | 한국어 |
name_nld | nld | nl | Dutch | Nederlands |
name_pol | pol | pl | Polish | Polski |
name_por | por | pt | Portuguese | Português |
name_rus | rus | ru | Russian | Русский |
name_spa | spa | es | Spanish | español |
name_swe | swe | sv | Swedish | Svenska |
name_tur | tur | tr | Turkish | Türkçe |
name_ukr | ukr | uk | Ukrainian | українська |
name_urd | urd | ur | Urdu | اردو |
name_vie | vie | vi | Vietnamese | Tiếng Việt |
name_zho | zho | zh | Chinese | 中文 |
NOTE: Chinese names may include a mix of simplified and/or traditional Chinese characters. The SQLite file tries to imply Simplified or Traditional characters with additional country tags.
Concordances
To provide a more ergonomic Shapefile experience, the SPR is pre-joined with the “concordances” table in the SQLite database and shorted the full WOF property names in the original GeoJSON:
field | field_full | type | description |
gn_id | gn:id | integer | GeoNames unique identifier |
wd_id | wd:id | string | Wikidata unique identifier |
concord_id | wof:concordances[“concord_ke”] | string | Official government statistical or census unique ID, useful for data joins |
concord_ke | wof:concordances_official | string | A valid wof:concordances key namespace indicating the source of the official concordance ID |
iso_code | iso:code | string | International Standards Organization Country and Subdivision Codes |
hasc_id | hasc:id | string | Statoids Hierarchical Set of Subdivision Codes |
Hierarchy
To provide a more ergonomic Shapefile experience, the SPR is pre-joined with the “ancestors” table in the SQLite database, keyed off ancestor_placetype:
field | field_full | type | description |
country_id | country | integer | The unique ID of the place’s country (or dependency) ancestor |
region_id | region | integer | The unique ID of the place’s region ancestor |
county_id | county | integer | The unique ID of the place’s county ancestor |
NOTE: Sometimes the country_id
property will be backfilled with the dependency ID (shapefile only).
Other goodies
To provide a more ergonomic Shapefile experience, several GeoJSON properties from the SPR table in the SQLite database are extracted and shorted the full WOF property names in the original GeoJSON:
field | field_full | type | description |
population | wof:population | integer | An integer value to represent the most current, known population of a place |
placetype_local | label:{lang}_x_preferred_placetype | string | An string value to represent the most localized placetype, falling back to wof:placetype_local in English (so US “state” instead of WOF “region”) |
is_funky | mz:is_funky | integer | An integer value used when the record is suspect, bad, or inappropriate but additional confirmation is needed before the feature is deprecated. Records with a 1 value are recommended to be hidden from map display and search unless explicitly asked for by name. |
min_zoom | mz:min_zoom | float | Float values (though in practice mosts are integer values) that match to web map zoom schema for when the geometry should be shown. Common range is 0.0 to 18.0, though they can be greater. |
max_zoom | mz:max_zoom | float | Float values (though in practice mosts are integer values) that match to web map zoom schema for when the geometry should be hidden. Common range is 0.0 to 18.0, though they can be greater. |
min_label | lbl:min_zoom | float | When the feature’s label should first appear. Float values (though in practice most are integer values) that match to web map zoom schema. Common range is 0.0 to 18.0, though they can be greater. |
max_label | lbl:max_zoom | float | When the feature’s label should be removed (or switched to a different representation like exterior ring line labels). Float values (though in practice most are integer values) that match to web map zoom schema. Common range is 0.0 to 18.0, though they can be greater. |
geom_src | src:geom | string | The data source of a record’s geometry. Valid property values are listed in the whosonfirst-sources repository
|