Shapefiles

Shapefiles

Introduced in 2023, shapefile downloads for the Who’s On First gazetteer are available as per-country ZIP archives including admin (with country, region, county, locality, neighbourhood & more placetypes), postalcode, and constituency bundles.

The Shapefile distribution data properties are a superset of the minimum set of standardized place response (SPR) properties adapted to the limitations of the popular format's geometry type and DBF short column name lengths, with select commonly used properties from the SQLite distribution added and denormalized into a single DBF table.

type ShapefileSchema interface {
	id() integer
	parent_id() integer
	name() string
	placetype() string
	country() string
	repo() string
	lat() float
	lon() float
	min_lat() float
	min_lon() float
	max_lat() float
	max_lon() float
	modified() date
	name_ara() string
	name_ben() string
	name_deu() string
	name_eng() string
	name_ell() string
	name_fas() string
	name_fra() string
	name_heb() string
	name_hin() string
	name_hun() string
	name_ind() string
	name_ita() string
	name_jpn() string
	name_kor() string
	name_nld() string
	name_pol() string
	name_por() string
	name_rus() string
	name_spa() string
	name_swe() string
	name_tur() string
	name_ukr() string
	name_urd() string
	name_vie() string
	name_zho() string
	gn_id() integer
	wd_id() integer
	concord_id() string
	concord_ke() string
	iso_code() string
	hasc_id() string
	country_id() integer
	region_id() integer
	county_id() integer
	population() integer
	placetype_local() string
	is_funky() integer
	min_zoom() float
	max_zoom() float
	min_label() float
	max_label() float
	geom_src() string
}

Shapefile ZIP archive file layout

Since shapefiles don’t support mixed geometry types, there can be 2 shapefiles for every WOF placetype, like “locality-points” and “locality-polygons” that collect the individual file components (with extensions like: shp, shx, dbf, prj, and cpg) into a single compressed ZIP archive.

For example:

Would have two shapefiles for the locality placetype, one per geometry type:

  • whosonfirst-data-admin-us-locality-polygon.shp
  • whosonfirst-data-admin-us-locality-point.shp

Shapefile Data Schema

The shapefile format includes “standard place response” (or SPR) and additional properties. Because of Shapefile’s 10-character limitation on DBF field name length (which would truncate some column names in an ambiguous way) we rename them explicitly, as noted below in the “field” column, with the “field_full” indicating the full SPR field name in the other formats, or full WOF property name. Some shapefile properties can coalesce values from multiple WOF source properties.

Standard Place Response (SPR) fields

With Shapefile we export most but not all of the core SPR properties.

field field_full type description
id Id integer The unique ID of the place
parent_id ParentId integer The unique parent ID of the place. Negative values indicate “complicated”.
name Name string The default name of the place (mostly English, mostly ASCII-7)
placetype Placetype string The Who’s On First placetype of the place
country Country string The two-letter country code of the place
repo Repo string The (Git) repository name where the source record for the place is stored.
lat Latitude float The latitude for the principal centroid (typically “label”) of the place
lon Longitude float The longitude for the principal centroid (typically “label”) of the place
min_lat MinLatitude float The minimum latitude of the bounding box of the place
min_lon MinLongitude float The minimum longitude of the bounding box of the place
max_lat MaxLatitude float The maximum latitude of the bounding box of the place
max_lon MaxLongitude float The maximum longitude of the bounding box of the place
modified LastModified date The Unix timestamp indicating when the place was last modified

Names

To provide a more ergonomic Shapefile experience, the SPR is pre-joined to the names table and includes 25 localized names, when available, for:

Arabic, Bengali, Chinese (simplified and/or traditional), Dutch, English, Farsi, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, Turkish, Ukrainian, Urdu, and Vietnamese

Who’s On First uses the RFC 5646/ BCP-47 language indications for names to specify a 3-character code for the following preferred locales as name_{locale} properties (so name_eng for English).

The list of supported Shapefile languages is adapted from Natural Earth and Tilezen’s list of core languages. Arabic, Chinese, English, French, Russian and Spanish are used by the United Nations for meetings and official documents. The other languages listed are either proposed as an official language of the United Nations (Bengali, Hindi, Portuguese, and Turkish) or frequently used in OpenStreetMap, Who’s On First, or Wikipedia.

NOTE: Several hundred other languages are supported in the SQLite distributions in the “names” table.

Languages and their name fields

field 3-char 2-char language native script
name_ara ara ar Arabic العربية
name_ben ben bn Bengali বাংলা
name_deu deu de German Deutsch
name_eng eng en English English
name_ell ell el Greek ελληνικά
name_fas fas fa Farsi فارسی
name_fra fra fr French français
name_heb heb he Hebrew עִבְרִית‎
name_hin hin hi Hindi हिन्दी
name_hun hun hu Hungarian magyar
name_ind ind id Indonesian Bahasa Indonesia
name_ita ita it Italian italiano
name_jpn jpn ja Japanese 日本語
name_kor kor ko Korean 한국어
name_nld nld nl Dutch Nederlands
name_pol pol pl Polish Polski
name_por por pt Portuguese Português
name_rus rus ru Russian Русский
name_spa spa es Spanish español
name_swe swe sv Swedish Svenska
name_tur tur tr Turkish Türkçe
name_ukr ukr uk Ukrainian українська
name_urd urd ur Urdu اردو
name_vie vie vi Vietnamese Tiếng Việt
name_zho zho zh Chinese 中文

NOTE: Chinese names may include a mix of simplified and/or traditional Chinese characters. The SQLite file tries to imply Simplified or Traditional characters with additional country tags.

Concordances

To provide a more ergonomic Shapefile experience, the SPR is pre-joined with the “concordances” table in the SQLite database and shorted the full WOF property names in the original GeoJSON:

field field_full type description
gn_id gn:id integer GeoNames unique identifier
wd_id wd:id string Wikidata unique identifier
concord_id wof:concordances[“concord_ke”] string Official government statistical or census unique ID, useful for data joins
concord_ke wof:concordances_official string A valid wof:concordances key namespace indicating the source of the official concordance ID
iso_code iso:code string International Standards Organization Country and Subdivision Codes
hasc_id hasc:id string Statoids Hierarchical Set of Subdivision Codes

Hierarchy

To provide a more ergonomic Shapefile experience, the SPR is pre-joined with the “ancestors” table in the SQLite database, keyed off ancestor_placetype:

field field_full type description
country_id country integer The unique ID of the place’s country (or dependency) ancestor
region_id region integer The unique ID of the place’s region ancestor
county_id county integer The unique ID of the place’s county ancestor

NOTE: Sometimes the country_id property will be backfilled with the dependency ID (shapefile only).

Other goodies

To provide a more ergonomic Shapefile experience, several GeoJSON properties from the SPR table in the SQLite database are extracted and shorted the full WOF property names in the original GeoJSON:

field field_full type description
population wof:population integer An integer value to represent the most current, known population of a place
placetype_local label:{lang}_x_preferred_placetype string An string value to represent the most localized placetype, falling back to wof:placetype_local in English (so US “state” instead of WOF “region”)
is_funky mz:is_funky integer An integer value used when the record is suspect, bad, or inappropriate but additional confirmation is needed before the feature is deprecated. Records with a 1 value are recommended to be hidden from map display and search unless explicitly asked for by name.
min_zoom mz:min_zoom float Float values (though in practice mosts are integer values) that match to web map zoom schema for when the geometry should be shown. Common range is 0.0 to 18.0, though they can be greater.
max_zoom mz:max_zoom float Float values (though in practice mosts are integer values) that match to web map zoom schema for when the geometry should be hidden. Common range is 0.0 to 18.0, though they can be greater.
min_label lbl:min_zoom float When the feature’s label should first appear. Float values (though in practice most are integer values) that match to web map zoom schema. Common range is 0.0 to 18.0, though they can be greater.
max_label lbl:max_zoom float When the feature’s label should be removed (or switched to a different representation like exterior ring line labels). Float values (though in practice most are integer values) that match to web map zoom schema. Common range is 0.0 to 18.0, though they can be greater.
geom_src src:geom string The data source of a record’s geometry. Valid property values are listed in the whosonfirst-sources repository