Shapefiles
Introduced in 2023, shapefile downloads for the Who’s On First gazetteer are available as per-country ZIP archives including admin (with country, region, county, locality, neighbourhood & more placetypes), postalcode, and constituency bundles.
The Shapefile distribution data properties are a superset of the minimum set of standardized place response (SPR) properties adapted to the limitations of the popular format's geometry type and DBF short column name lengths, with select commonly used properties from the SQLite distribution added and denormalized into a single DBF table.
type ShapefileSchema interface {
id() integer
parent_id() integer
name() string
placetype() string
country() string
repo() string
lat() float
lon() float
min_lat() float
min_lon() float
max_lat() float
max_lon() float
modified() date
name_ara() string
name_ben() string
name_deu() string
name_eng() string
name_ell() string
name_fas() string
name_fra() string
name_heb() string
name_hin() string
name_hun() string
name_ind() string
name_ita() string
name_jpn() string
name_kor() string
name_nld() string
name_pol() string
name_por() string
name_rus() string
name_spa() string
name_swe() string
name_tur() string
name_ukr() string
name_urd() string
name_vie() string
name_zho() string
gn_id() integer
wd_id() integer
concord_id() string
concord_ke() string
iso_code() string
hasc_id() string
country_id() integer
region_id() integer
county_id() integer
population() integer
placetype_local() string
is_funky() integer
min_zoom() float
max_zoom() float
min_label() float
max_label() float
geom_src() string
}
Shapefile ZIP archive file layout
Since shapefiles don’t support mixed geometry types, there can be 2 shapefiles for every WOF placetype, like “locality-points” and “locality-polygons” that collect the individual file components (with extensions like: shp, shx, dbf, prj, and cpg) into a single compressed ZIP archive.
For example:
Would have two shapefiles for the locality placetype, one per geometry type:
- whosonfirst-data-admin-us-locality-polygon.shp
- whosonfirst-data-admin-us-locality-point.shp
Shapefile Data Schema
The shapefile format includes “standard place response” (or SPR) and additional properties. Because of Shapefile’s 10-character limitation on DBF field name length (which would truncate some column names in an ambiguous way) we rename them explicitly, as noted below in the “field” column, with the “field_full” indicating the full SPR field name in the other formats, or full WOF property name. Some shapefile properties can coalesce values from multiple WOF source properties.
Standard Place Response (SPR) fields
With Shapefile we export most but not all of the core SPR properties.
| field | field_full | type | description |
| id | Id | integer | The unique ID of the place |
| parent_id | ParentId | integer | The unique parent ID of the place. Negative values indicate “complicated”. |
| name | Name | string | The default name of the place (mostly English, mostly ASCII-7) |
| placetype | Placetype | string | The Who’s On First placetype of the place |
| country | Country | string | The two-letter country code of the place |
| repo | Repo | string | The (Git) repository name where the source record for the place is stored. |
| lat | Latitude | float | The latitude for the principal centroid (typically “label”) of the place |
| lon | Longitude | float | The longitude for the principal centroid (typically “label”) of the place |
| min_lat | MinLatitude | float | The minimum latitude of the bounding box of the place |
| min_lon | MinLongitude | float | The minimum longitude of the bounding box of the place |
| max_lat | MaxLatitude | float | The maximum latitude of the bounding box of the place |
| max_lon | MaxLongitude | float | The maximum longitude of the bounding box of the place |
| modified | LastModified | date | The Unix timestamp indicating when the place was last modified |
Names
To provide a more ergonomic Shapefile experience, the SPR is pre-joined to the names table and includes 25 localized names, when available, for:
Arabic, Bengali, Chinese (simplified and/or traditional), Dutch, English, Farsi, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese
Who’s On First uses the RFC 5646/ BCP-47 language indications for names to specify a 3-character code for the following preferred locales as name_{locale} properties (so name_eng for English).
The list of supported Shapefile languages is adapted from Natural Earth and Tilezen’s list of core languages. Arabic, Chinese, English, French, Russian and Spanish are used by the United Nations for meetings and official documents. The other languages listed are either proposed as an official language of the United Nations (Bengali, Hindi, Portuguese, and Turkish) or frequently used in OpenStreetMap, Who’s On First, or Wikipedia.
NOTE: Several hundred other languages are supported in the SQLite distributions in the “names” table.
Languages and their name fields
| field | 3-char | 2-char | language | native script |
| name_ara | ara | ar | Arabic | العربية |
| name_ben | ben | bn | Bengali | বাংলা |
| name_deu | deu | de | German | Deutsch |
| name_eng | eng | en | English | English |
| name_ell | ell | el | Greek | ελληνικά |
| name_fas | fas | fa | Farsi | فارسی |
| name_fra | fra | fr | French | français |
| name_heb | heb | he | Hebrew | עִבְרִית |
| name_hin | hin | hi | Hindi | हिन्दी |
| name_hun | hun | hu | Hungarian | magyar |
| name_ind | ind | id | Indonesian | Bahasa Indonesia |
| name_ita | ita | it | Italian | italiano |
| name_jpn | jpn | ja | Japanese | 日本語 |
| name_kor | kor | ko | Korean | 한국어 |
| name_nld | nld | nl | Dutch | Nederlands |
| name_pol | pol | pl | Polish | Polski |
| name_por | por | pt | Portuguese | Português |
| name_rus | rus | ru | Russian | Русский |
| name_spa | spa | es | Spanish | español |
| name_swe | swe | sv | Swedish | Svenska |
| name_tur | tur | tr | Turkish | Türkçe |
| name_ukr | ukr | uk | Ukrainian | українська |
| name_urd | urd | ur | Urdu | اردو |
| name_vie | vie | vi | Vietnamese | Tiếng Việt |
| name_zho | zho | zh | Chinese | 中文 |
NOTE: Chinese names may include a mix of simplified and/or traditional Chinese characters. The SQLite file tries to imply Simplified or Traditional characters with additional country tags.
Concordances
To provide a more ergonomic Shapefile experience, the SPR is pre-joined with the “concordances” table in the SQLite database and shorted the full WOF property names in the original GeoJSON:
| field | field_full | type | description |
| gn_id | gn:id | integer | GeoNames unique identifier |
| wd_id | wd:id | string | Wikidata unique identifier |
| concord_id | wof:concordances[“concord_ke”] | string | Official government statistical or census unique ID, useful for data joins |
| concord_ke | wof:concordances_official | string | A valid wof:concordances key namespace indicating the source of the official concordance ID |
| iso_code | iso:code | string | International Standards Organization Country and Subdivision Codes |
| hasc_id | hasc:id | string | Statoids Hierarchical Set of Subdivision Codes |
Hierarchy
To provide a more ergonomic Shapefile experience, the SPR is pre-joined with the “ancestors” table in the SQLite database, keyed off ancestor_placetype:
| field | field_full | type | description |
| country_id | country | integer | The unique ID of the place’s country (or dependency) ancestor |
| region_id | region | integer | The unique ID of the place’s region ancestor |
| county_id | county | integer | The unique ID of the place’s county ancestor |
NOTE: Sometimes the country_id property will be backfilled with the dependency ID (shapefile only).
Other goodies
To provide a more ergonomic Shapefile experience, several GeoJSON properties from the SPR table in the SQLite database are extracted and shorted the full WOF property names in the original GeoJSON:
| field | field_full | type | description |
| population | wof:population | integer | An integer value to represent the most current, known population of a place |
| placetype_local | label:{lang}_x_preferred_placetype | string | An string value to represent the most localized placetype, falling back to wof:placetype_local in English (so US “state” instead of WOF “region”) |
| is_funky | mz:is_funky | integer | An integer value used when the record is suspect, bad, or inappropriate but additional confirmation is needed before the feature is deprecated. Records with a 1 value are recommended to be hidden from map display and search unless explicitly asked for by name. |
| min_zoom | mz:min_zoom | float | Float values (though in practice mosts are integer values) that match to web map zoom schema for when the geometry should be shown. Common range is 0.0 to 18.0, though they can be greater. |
| max_zoom | mz:max_zoom | float | Float values (though in practice mosts are integer values) that match to web map zoom schema for when the geometry should be hidden. Common range is 0.0 to 18.0, though they can be greater. |
| min_label | lbl:min_zoom | float | When the feature’s label should first appear. Float values (though in practice most are integer values) that match to web map zoom schema. Common range is 0.0 to 18.0, though they can be greater. |
| max_label | lbl:max_zoom | float | When the feature’s label should be removed (or switched to a different representation like exterior ring line labels). Float values (though in practice most are integer values) that match to web map zoom schema. Common range is 0.0 to 18.0, though they can be greater. |
| geom_src | src:geom | string | The data source of a record’s geometry. Valid property values are listed in the whosonfirst-sources repository
|