The Who's On First (WOF) project is not pretending to be the authority of truth, but rather a home for data from various sources and a project that we hope generates discussion. The WOF gazetteer houses data for many geographies, including counties (which parent cities) and cities (which parent microhoods, neighbourhoods, and macrohoods). Neighbourhoods are the places where we live and work in cities; the more neighbourhood data, the better.
Photo Credit: Travis Wise, Flickr
Our Audience: Those who care about neighbourhoods and have a few hours to a few days to spend diving into QGIS.
Your Skills/Experience: This tutorial was written with the intermediate to advanced GIS user. While this tutorial tries to explain the editing process in detail, some information may not be immediately clear to a beginning user.
Why are neighbourhoods important to mapping?
It is important to understand where placetypes fall within the WOF hierarchy. Neighbourhoods, macrohoods, and microhoods are described below:
This tutorial addresses the issues we found with San Francisco's neighbourhoods and describes the workflow we took in fixing them. In the end, we hope you can follow along and edit neighbourhoods for your own city!
What are your options when updating neighbourhoods?
Key terms:
Check out these key terms if you have any questions about the vocabulary used in this guide.
We received a report in the whosonfirst-data repository that the shape of San Francisco’s Golden Gate Park neighbourhood was both too small and extended into adjacent neighbourhoods. While researching online sources for a better shape, we noticed that most adjacent neighbourhood shapes in WOF could also be improved to align better with the road network and local expectations. After researching neighbourhood shapes online, we downloaded neighbourhood shapes for San Francisco from SF OpenData, a city data website, to compare with the neighbourhood records in our WOF repository. We then filed a new issue to handle all neighbourhood updates for San Francisco.
Typically, Who’s On First sources Quattroshapes geometries for most neighbourhoods around the globe. However, many neighbourhoods in the United States, including San Francisco, source their default geometry from Zetashapes. The Zetashapes project follows the same basic principles as Quattroshapes, but builds shapes up from Census 2010 features and can draw shapes that are too big, small, or just plain weird. We’ve seen problems with shapes extending in the water and far out into neighboring rural areas. This technique is responsible for the issues that we are correcting in San Francisco.
Drawing neighbourhood shapes is a tricky business. Strangers generally agree on what a neighbourhood is named and its rough shape, but even good friends can argue vehemently about where one neighbourhood ends and another begins - even if there are hard edges between neighbourhoods or they should overlap. Recognizing this, Who’s On First allows multiple alternate geometries for a place, but for practical reasons we need to set just one shape as the default geometry.
To clean up our neighbourhood geometries, we needed to take five steps:
While specifics listed in this tutorial may reference San Francisco, our hope is that you will be able to follow along with these steps to update the neighbourhood records in your locality.
In this section, you will either:
OR
git checkout
command in your terminal to clone necessary repositoriesThe Bundler Tool quickly gathers a collection of descendant records in GeoJSON format given your the parent record's ID. Simply click the "Download Descendants" link on any record's page in the Spelunker.
This is the "quick and easy" method to gathering neighbourhood records, however, this tool only gathers up to 500 WOF records. For larger queries or markets, you will need to clone necessary repositories and collect a .geojson file. Instructions below:
setuptools
for Python by downloading Python 2.7 (or a more current version) and GDAL 2.1
by downloading QGIS 2.14 (or a more current version).git checkout
on the WOF Data repository, WOF Properties repository, and WOF Utils repository.install
script in whosonfirst-utils repository.WOF-csv-to-feature-collection.py
script in your Utils
repo. Update line 63 with your local filepath.Once complete, entering the following string in the terminal from the whosonfirst-utils repository's scripts
folder allows us to collect San Francisco's neighbourhoods as a single .geojson file (updating filepaths of your local machine accordingly):
python WOF-csv-to-feature-collection -p /usr/local/mapzen/whosonfirst-data/data -c /usr/local/mapzen/whosonfirst-data/meta/WOF-neighbourhood-latest.csv --aliases /usr/local/mapzen/whosonfirst-properties/aliases/property_aliases.json -o ~/Desktop/SF_Neighbourhoods.geojson --slim --slim-template external_editor -f 85922583
Image: Data collection script in the terminal.
Note: The trailing number 85922583
at the end of this script is the WOF ID for San Francisco. When running this script, make sure to update that ID with whatever record you need neighbourhood geometries for. An ID is a unique identifier for records in WOF. Each record in WOF has an ID. To find the ID for your city, search the Spelunker and copy the ID.
Voilà! We have a .geojson of WOF neighbourhood records in San Francisco! By changing the trailing ID (explained below), you can collect neighbourhood records for your own locality (city).
To better understand what we're requesting of our command, here is a breakdown of exactly what is included:
python
Used to invoke pythonWOF-csv-to-feature-collection
The python script that collects WOF records.-p /PATH/whosonfirst-data/data
sets the path to the local copy of all the WOF data. You will need to update this PATH depending on where you checked the file out.-c /PATH/whosonfirst-data/meta/WOF-neighbourhood-latest.csv
The metafile for the placetype you are interested in - neighbourhood. You will need to update this PATH depending on where you checked the file out.--aliases /usr/local/mapzen/whosonfirst-properties/aliases/property_aliases.json
Pulling in various aliases for attribute fields for your output file.-o ~/Desktop/SFNeighbourhoods.geojson
Your output file.--slim
Option parser to limit property export to subset (roughly those in the CSV file) and reduce file size.--slim-template
Option parser to trim key names to fit Esri Shapefile format (10 charachter length limit).external_editor
Return only necessary attribute fields for neighbourhood edits.-f 85922583
ID of the locality you need neighbourhood records for, found by searching our Spelunker.About the external_editor
option:
This option was created for our neighbourhood editors and exports only relevant and required record attributes for WOF neighbourhood records. All required attributes (and applicable optional attributes) should be included when filing your PR of updated neighbourhood records. The external_editor
attribute fields include:
'name' The attribute that will be used for the wof:name
field. Required for all files.
'id' The attribute that will be used for the wof:id
field. Required for all files.
'placetype' The attribute that will be used for the wof:placetype
field. Required for all files.
'parent_id' The attribute that will be used for the wof:parent_id
field. Should equal the wof:id
of the next placetype up in the feature's hierarchy. Required for all files.
'is_landuse_aoi' Used to signify an "area of interest" land use, different than the wof:placetype
would suggest.
'supersedes' The attribute that will be used for the wof:supersedes
field. Required for all files that supersede another WOF record.
'superseded_by' The attribute that will be used for the wof:superseded_by
field. Required for all files that are superseded by another WOF record.
'deprecated' The attribute that will be used for the edtf:deprecated
field. A date field (YYYY-MM-DD) used to signify when we determined this record was "junk" and incorrect since inclusion in Who's On First. Required for all files that are deprecated. (see: cessation vs. deprecated documentation)
'cessation' The attribute that will be used for the edtf:cessation
field. A date field (YYYY-MM-DD) used to signify when we determined this record was no longer valid in Who's On First (usually when it has been replaced by another record). Also requires a new field called mz:is_current
to be created. The mz:is_current
field value is '0' if the record has a edtf:cessation
date. (see: cessation vs. deprecated documentation)
'eng_preferred_name' Preferred alternative names for a feature.
'eng_variant_name' Variant alternative names for a feature.
'max_zoom' Maximum zoom level at which feature labels appear on a map. Required for all new features to Who's On First.
'min_zoom' Minimum zoom level at which feature labels appear on a map. Required for all new features to Who's On First.
'lbl_latitude' A decimal value of a feature's Y coordinate. Value should be derived from MapShaper and included on any new shape imported to Who's On First. New X/Y coordinates attributes will be created when running a geojson or shapefile through the MapShaper tool. These new fields will be used for lbl:latitude
and lbl:longitude
. The MapShaper tool is also available via http://mapshaper.org/.
'lbl_longitude' A decimal value of a feature's X coordinate. Value should be derived from MapShaper and included on any new shape imported to Who's On First. New X/Y coordinates attributes will be created when running a geojson or shapefile through the MapShaper tool. These new fields will be used for lbl:latitude
and lbl:longitude
. The MapShaper tool is also available via http://mapshaper.org/.
'rev_lat' Equal to the lbl_latitude
value. This field will be used to rebuild the feature's hierarchy.
'rev_long' Equal to the lbl_longitude
value. This field will be used to rebuild the feature's hierarchy.
'is_funky' Optional.
'is_hard_boundary' Optional.
'is_official' Optional.
'tier_locality' Optional.
'country_id' The wof:id
of the feature's parent country.
'region_id' The wof:id
of the feature's parent region.
'locality_id' The wof:id
of the feature's parent locality.
'WOF_country' Usually equal to the ISO code, this is a two-digit key representing the country your features are in. Required for all features.
'iso_country' Set by the Internation Organization for Standardization, This is a two-digit key (listed here) representing the country your features are in. Required for all features.
'src_geom' Optional. Contains the source name of the feature's geometry. Given for research purposes.
'mz_note' Optional. Contains information about that given feature, if available.
From the WOF repository for San Francisco, a total of 165 (157 polygon geometries, 8 point geometries) records for neighbourhoods were collected using our external_editor
option. QGIS was used to preview Who’s On First neighbourhood shapes (below).
Image: San Francisco neighbourhood records (light purple) in comparison to the San Francisco Bay Area and the WOF shapes of San Francisco (blue).
You might notice the general shape of San Francisco present in the photo below, but it's tough to make out. Many of these WOF neighbourhood shapes cross into what most people would consider a different neighbourhood, and, in two cases, include areas in different counties. The good news? The majority of these neighbourhood records contain usable information in their WOF attribute fields. Also, be mindful of the differences between single-part and multi-part polygon edits in QGIS. If your neighbourhood shapes are single-part polygons to begin with, you are unable to add a multi-part polygon into your feature collection.
Image: San Francisco neighbourhood records from Who's On First projected in QGIS.
In some cities, we have detailed polygon shapes for most, but not all neighbourhoods. Occassionally, we only know the name and the approximate point representing the label centroid. We need to establish a concordance between all WOF records, points and polygons. We'll need to add a polygon shape to replace these points; this is covered below in step four.
In this section, you will:
Because Who’s On First is liberally licensed open data, we must be selective about adding new data. We either need to find a new source that is open data with a CC-BY or CC-0 license that allows commercial and derivative works or create new shapes based on local knowledge and by cross-referencing multiple sources. Ideally, this new source should be an improvement over what Who’s On First already knows about the place.
In our example, the City and County of San Francisco hosts various neighbourhood-related shapefiles through it's OpenData portal, so we had a few options to choose from.
Just because your locality hosts a neighbourhood dataset, does not mean the neighbourhood geometries are useable. For example, city planning departments often group neighbourhoods together for planning purposes; you can start with these geometries, but they should be double-checked before import. For instance, if a shape is named Name 1 - Name 2 - Name 3 (e.g. Mission-Potrero-SoMa), it should probably be split into three polygons before import, one for each neighbourhood.
Don't blindly trust an authoritative set of neighbourhood shapes. Review a few other neighbourhood sources to compare names, attributes, shape detail, and coverage. Ensuring that you have an accurate set of neighbourhood shapes and adequate attributes will save time when reconciling the data with existing Who's On First records.
In San Francisco, we did not choose the planning department shapes, as those were too coarse and used more for statistical groupings (there weren’t as many neighbourhoods as we had already, and their shapes were way too big, more like macrohoods). After research, it was found that the Mayor's Office created a set of geometries that were built to match local expectations and there was a similar number of places to what was in Who’s On First. Their colloquial shapes matched up with what we thought they should like as San Francisco locals.
Once we verified the data was provided through an open license, we created a new source, sfgov
, in our sources repository to give credit to the original author. This dataset was then downloaded to our desktop and added to a new QGIS document to compare with the existing shapes in WOF. Lucky for us, the SF OpenData is already in the WGS84 projection and does not need to be reprojected.
Lastly, we need to add any properties that are retained from the source data file to our properties_alias.json file. In San Francisco, for example, we have a sfgov:name
field. In order to bring this property in, we will need to add it to the property_alias.json file (see file for formatting examples).
In this section, you will:
Image: SF OpenData neighbourhood data projected in QGIS.
Once the data source was added to our source repository, the data was downloaded and placed into a new QGIS document (above) to compare to the geometries of WOF records. You can see the clean, non-overlapping geometries in the SF OpenData, unlike our existing WOF geometries (below).
Image: WOF records projected in QGIS.
Now that we have WOF records and data provided by the City of San Francisco, we can begin reconciling the two datasets. We will join the two datasets based on a common attribute; in this case the wof:name
field from the WOF data was joined to the SF OpenData's name
field. The join tool in QGIS can be found by navigating to the properties of the WOF .geojson layer and clicking the "Join" option (below).
Image: Join Properties tool in QGIS.
In an ideal world, all WOF records would join cleanly to SF OpenData records, but that is typically not the case. This join method worked for the most part, but because the spellings are not identical between each of the attribute tables, this join needs to be verified and improved by hand. For example, QGIS's join tool did not join a value of Haight Ashbury
to a value of Haight-Ashbury
or a value of Mission District
to a value of Mission
. As described below, it's not a matter of which name field is "more correct", but a matter of importing additional names from your authoritative source while preserving the existing wof:name
in a eng_x_variant
name field.
Alternately, we could perform this join based on location, instead of an attribute field. QGIS has functionality to perform a spatial join (some documentation here), which would be helpful if our WOF geometries were geographically similar to our administrative data. However, because our geometries in San Francisco overlap substantially with the SF OpenData geometries, an attribute join is more likely to give us matching records between the two datasets (generally, neighbourhood names are unique in city). If you are unsure of which join is best for your locality, give them both a try and compare the results.
Occasionally, Who's On First has duplicate records. Duplicate records should be noted in the mz:note
property (with a "duplicate", "dupe" or "dup" value) or with a 1
value in the mz:is_funky
or mz:is_current
property. If you come across duplicate features when comparing and joining records, prefer the current record over the record with a mz:note
of "duplicate", "dupe" or "dup" or a 1
value in the mz:is_funky
or mz:is_current
property.
Who's On First | SF OpenData | Note |
---|---|---|
Alamo Square | Alamo Square | in both, great! Let's import the new geometry! |
Anza Vista | Anza Vista | in both, great! Let's import the new geometry! |
Baja Noe | no match, no alternate name spelling, WOF only | |
Bret Harte | no WOF record, let's research. | |
Haight Ashbury | no name match, but does have alternate name: Haight-Ashbury | |
Cathedral Hill | Cathedral Hill | in both, great! |
Image: Comparison of WOF and SF OpenData name attributes.
This method assigned wof:id
values to each SF OpenData record that joined to a WOF record. After comparing, 96 of 117 SF OpenData records were assigned a wof:id
. With the records that did not join based on the name
field join, we will have to reconcile, adding the wof:id
manually whenever possible. This is done easiest by reviewing each name in your WOF attributes table with each name of your source data.
Image: SF OpenData attribute table after joining datasets. Records with NULL values need to be imported by hand.
Examples of neighbourhoods that WOF did not have at the time of import are Appearel City
and Buena Vista
(meaning these records will need a new WOF ID). Since these neighbourhoods were not in the WOF database, we should consider importing them as new neighbourhood records.
In this section, you will:
Remember - not all of the existing neighbourhood records matched to an SF OpenData geometry (96 new records were given existing IDs, but there were 165 existing neighbourhood records). This begs the question: What should happen to the 69 leftover neighbourhood records?
There are three options for the 69 leftover records:
parented_by
value for the neighbourhood it falls within. People still use this name, but only the residents of those few city blocks.Before importing the city-provided geometries, it is important to ensure the new neighbourhood boundaries will work in Who’s On First. While we can easily import the new neighbourhood geometries raw from our source (SF OpenData), we should "trust but verify" our data before the import.
The majority of geometries in the SF OpenData source were imported as-is, though four neighbourhood records in our San Francisco example were edited prior to import. Using our local knowledge and opinions, we adjusted these neighbourhood boundaries slightly.
After our WOF shapes were added to a QGIS document (below), they were given a new integer attribute field titled "status" based on concordance with our administrative source data. The status values were color-coded to display the following options for each of WOF neighbourhood records:
Image: Developing an action plan in QGIS by assigning records' status in QGIS, reviewing WOF record matches with new SF OpenData source. Colors represent status value.
Congratulations! You have finished collecting and evaluating new neighbourhood shapes for WOF! In the next part of this tutorial, we'll prepare the data for import. But first, a well-deserved break.
To finalize your work and prepare data for import, check out part two!
The Who's On First (WOF) project is not pretending to be the authority of truth, but rather a home for data from various sources and a project that we hope generates discussion. The WOF gazetteer houses data for many geographies, including counties (which parent cities) and cities (which parent microhoods, neighbourhoods, and macrohoods). Neighbourhoods are the places where we live and work in cities; the more neighbourhood data, the better.
Photo Credit: Travis Wise, Flickr
Our Audience: Those who care about neighbourhoods and have a few hours to a few days to spend diving into QGIS.
Your Skills/Experience: This tutorial was written with the intermediate to advanced GIS user. While this tutorial tries to explain the editing process in detail, some information may not be immediately clear to a beginning user.
Why are neighbourhoods important to mapping?
It is important to understand where placetypes fall within the WOF hierarchy. Neighbourhoods, macrohoods, and microhoods are described below:
This tutorial addresses the issues we found with San Francisco's neighbourhoods and describes the workflow we took in fixing them. In the end, we hope you can follow along and edit neighbourhoods for your own city!
What are your options when updating neighbourhoods?
Key terms:
Check out these key terms if you have any questions about the vocabulary used in this guide.
We received a report in the whosonfirst-data repository that the shape of San Francisco’s Golden Gate Park neighbourhood was both too small and extended into adjacent neighbourhoods. While researching online sources for a better shape, we noticed that most adjacent neighbourhood shapes in WOF could also be improved to align better with the road network and local expectations. After researching neighbourhood shapes online, we downloaded neighbourhood shapes for San Francisco from SF OpenData, a city data website, to compare with the neighbourhood records in our WOF repository. We then filed a new issue to handle all neighbourhood updates for San Francisco.
Typically, Who’s On First sources Quattroshapes geometries for most neighbourhoods around the globe. However, many neighbourhoods in the United States, including San Francisco, source their default geometry from Zetashapes. The Zetashapes project follows the same basic principles as Quattroshapes, but builds shapes up from Census 2010 features and can draw shapes that are too big, small, or just plain weird. We’ve seen problems with shapes extending in the water and far out into neighboring rural areas. This technique is responsible for the issues that we are correcting in San Francisco.
Drawing neighbourhood shapes is a tricky business. Strangers generally agree on what a neighbourhood is named and its rough shape, but even good friends can argue vehemently about where one neighbourhood ends and another begins - even if there are hard edges between neighbourhoods or they should overlap. Recognizing this, Who’s On First allows multiple alternate geometries for a place, but for practical reasons we need to set just one shape as the default geometry.
To clean up our neighbourhood geometries, we needed to take five steps:
While specifics listed in this tutorial may reference San Francisco, our hope is that you will be able to follow along with these steps to update the neighbourhood records in your locality.
In this section, you will either:
OR
git checkout
command in your terminal to clone necessary repositoriesThe Bundler Tool quickly gathers a collection of descendant records in GeoJSON format given your the parent record's ID. Simply click the "Download Descendants" link on any record's page in the Spelunker.
This is the "quick and easy" method to gathering neighbourhood records, however, this tool only gathers up to 500 WOF records. For larger queries or markets, you will need to clone necessary repositories and collect a .geojson file. Instructions below:
setuptools
for Python by downloading Python 2.7 (or a more current version) and GDAL 2.1
by downloading QGIS 2.14 (or a more current version).git checkout
on the WOF Data repository, WOF Properties repository, and WOF Utils repository.install
script in whosonfirst-utils repository.WOF-csv-to-feature-collection.py
script in your Utils
repo. Update line 63 with your local filepath.Once complete, entering the following string in the terminal from the whosonfirst-utils repository's scripts
folder allows us to collect San Francisco's neighbourhoods as a single .geojson file (updating filepaths of your local machine accordingly):
python WOF-csv-to-feature-collection -p /usr/local/mapzen/whosonfirst-data/data -c /usr/local/mapzen/whosonfirst-data/meta/WOF-neighbourhood-latest.csv --aliases /usr/local/mapzen/whosonfirst-properties/aliases/property_aliases.json -o ~/Desktop/SF_Neighbourhoods.geojson --slim --slim-template external_editor -f 85922583
Image: Data collection script in the terminal.
Note: The trailing number 85922583
at the end of this script is the WOF ID for San Francisco. When running this script, make sure to update that ID with whatever record you need neighbourhood geometries for. An ID is a unique identifier for records in WOF. Each record in WOF has an ID. To find the ID for your city, search the Spelunker and copy the ID.
Voilà! We have a .geojson of WOF neighbourhood records in San Francisco! By changing the trailing ID (explained below), you can collect neighbourhood records for your own locality (city).
To better understand what we're requesting of our command, here is a breakdown of exactly what is included:
python
Used to invoke pythonWOF-csv-to-feature-collection
The python script that collects WOF records.-p /PATH/whosonfirst-data/data
sets the path to the local copy of all the WOF data. You will need to update this PATH depending on where you checked the file out.-c /PATH/whosonfirst-data/meta/WOF-neighbourhood-latest.csv
The metafile for the placetype you are interested in - neighbourhood. You will need to update this PATH depending on where you checked the file out.--aliases /usr/local/mapzen/whosonfirst-properties/aliases/property_aliases.json
Pulling in various aliases for attribute fields for your output file.-o ~/Desktop/SFNeighbourhoods.geojson
Your output file.--slim
Option parser to limit property export to subset (roughly those in the CSV file) and reduce file size.--slim-template
Option parser to trim key names to fit Esri Shapefile format (10 charachter length limit).external_editor
Return only necessary attribute fields for neighbourhood edits.-f 85922583
ID of the locality you need neighbourhood records for, found by searching our Spelunker.About the external_editor
option:
This option was created for our neighbourhood editors and exports only relevant and required record attributes for WOF neighbourhood records. All required attributes (and applicable optional attributes) should be included when filing your PR of updated neighbourhood records. The external_editor
attribute fields include:
'name' The attribute that will be used for the wof:name
field. Required for all files.
'id' The attribute that will be used for the wof:id
field. Required for all files.
'placetype' The attribute that will be used for the wof:placetype
field. Required for all files.
'parent_id' The attribute that will be used for the wof:parent_id
field. Should equal the wof:id
of the next placetype up in the feature's hierarchy. Required for all files.
'is_landuse_aoi' Used to signify an "area of interest" land use, different than the wof:placetype
would suggest.
'supersedes' The attribute that will be used for the wof:supersedes
field. Required for all files that supersede another WOF record.
'superseded_by' The attribute that will be used for the wof:superseded_by
field. Required for all files that are superseded by another WOF record.
'deprecated' The attribute that will be used for the edtf:deprecated
field. A date field (YYYY-MM-DD) used to signify when we determined this record was "junk" and incorrect since inclusion in Who's On First. Required for all files that are deprecated. (see: cessation vs. deprecated documentation)
'cessation' The attribute that will be used for the edtf:cessation
field. A date field (YYYY-MM-DD) used to signify when we determined this record was no longer valid in Who's On First (usually when it has been replaced by another record). Also requires a new field called mz:is_current
to be created. The mz:is_current
field value is '0' if the record has a edtf:cessation
date. (see: cessation vs. deprecated documentation)
'eng_preferred_name' Preferred alternative names for a feature.
'eng_variant_name' Variant alternative names for a feature.
'max_zoom' Maximum zoom level at which feature labels appear on a map. Required for all new features to Who's On First.
'min_zoom' Minimum zoom level at which feature labels appear on a map. Required for all new features to Who's On First.
'lbl_latitude' A decimal value of a feature's Y coordinate. Value should be derived from MapShaper and included on any new shape imported to Who's On First. New X/Y coordinates attributes will be created when running a geojson or shapefile through the MapShaper tool. These new fields will be used for lbl:latitude
and lbl:longitude
. The MapShaper tool is also available via http://mapshaper.org/.
'lbl_longitude' A decimal value of a feature's X coordinate. Value should be derived from MapShaper and included on any new shape imported to Who's On First. New X/Y coordinates attributes will be created when running a geojson or shapefile through the MapShaper tool. These new fields will be used for lbl:latitude
and lbl:longitude
. The MapShaper tool is also available via http://mapshaper.org/.
'rev_lat' Equal to the lbl_latitude
value. This field will be used to rebuild the feature's hierarchy.
'rev_long' Equal to the lbl_longitude
value. This field will be used to rebuild the feature's hierarchy.
'is_funky' Optional.
'is_hard_boundary' Optional.
'is_official' Optional.
'tier_locality' Optional.
'country_id' The wof:id
of the feature's parent country.
'region_id' The wof:id
of the feature's parent region.
'locality_id' The wof:id
of the feature's parent locality.
'WOF_country' Usually equal to the ISO code, this is a two-digit key representing the country your features are in. Required for all features.
'iso_country' Set by the Internation Organization for Standardization, This is a two-digit key (listed here) representing the country your features are in. Required for all features.
'src_geom' Optional. Contains the source name of the feature's geometry. Given for research purposes.
'mz_note' Optional. Contains information about that given feature, if available.
From the WOF repository for San Francisco, a total of 165 (157 polygon geometries, 8 point geometries) records for neighbourhoods were collected using our external_editor
option. QGIS was used to preview Who’s On First neighbourhood shapes (below).
Image: San Francisco neighbourhood records (light purple) in comparison to the San Francisco Bay Area and the WOF shapes of San Francisco (blue).
You might notice the general shape of San Francisco present in the photo below, but it's tough to make out. Many of these WOF neighbourhood shapes cross into what most people would consider a different neighbourhood, and, in two cases, include areas in different counties. The good news? The majority of these neighbourhood records contain usable information in their WOF attribute fields. Also, be mindful of the differences between single-part and multi-part polygon edits in QGIS. If your neighbourhood shapes are single-part polygons to begin with, you are unable to add a multi-part polygon into your feature collection.
Image: San Francisco neighbourhood records from Who's On First projected in QGIS.
In some cities, we have detailed polygon shapes for most, but not all neighbourhoods. Occassionally, we only know the name and the approximate point representing the label centroid. We need to establish a concordance between all WOF records, points and polygons. We'll need to add a polygon shape to replace these points; this is covered below in step four.
In this section, you will:
Because Who’s On First is liberally licensed open data, we must be selective about adding new data. We either need to find a new source that is open data with a CC-BY or CC-0 license that allows commercial and derivative works or create new shapes based on local knowledge and by cross-referencing multiple sources. Ideally, this new source should be an improvement over what Who’s On First already knows about the place.
In our example, the City and County of San Francisco hosts various neighbourhood-related shapefiles through it's OpenData portal, so we had a few options to choose from.
Just because your locality hosts a neighbourhood dataset, does not mean the neighbourhood geometries are useable. For example, city planning departments often group neighbourhoods together for planning purposes; you can start with these geometries, but they should be double-checked before import. For instance, if a shape is named Name 1 - Name 2 - Name 3 (e.g. Mission-Potrero-SoMa), it should probably be split into three polygons before import, one for each neighbourhood.
Don't blindly trust an authoritative set of neighbourhood shapes. Review a few other neighbourhood sources to compare names, attributes, shape detail, and coverage. Ensuring that you have an accurate set of neighbourhood shapes and adequate attributes will save time when reconciling the data with existing Who's On First records.
In San Francisco, we did not choose the planning department shapes, as those were too coarse and used more for statistical groupings (there weren’t as many neighbourhoods as we had already, and their shapes were way too big, more like macrohoods). After research, it was found that the Mayor's Office created a set of geometries that were built to match local expectations and there was a similar number of places to what was in Who’s On First. Their colloquial shapes matched up with what we thought they should like as San Francisco locals.
Once we verified the data was provided through an open license, we created a new source, sfgov
, in our sources repository to give credit to the original author. This dataset was then downloaded to our desktop and added to a new QGIS document to compare with the existing shapes in WOF. Lucky for us, the SF OpenData is already in the WGS84 projection and does not need to be reprojected.
Lastly, we need to add any properties that are retained from the source data file to our properties_alias.json file. In San Francisco, for example, we have a sfgov:name
field. In order to bring this property in, we will need to add it to the property_alias.json file (see file for formatting examples).
In this section, you will:
Image: SF OpenData neighbourhood data projected in QGIS.
Once the data source was added to our source repository, the data was downloaded and placed into a new QGIS document (above) to compare to the geometries of WOF records. You can see the clean, non-overlapping geometries in the SF OpenData, unlike our existing WOF geometries (below).
Image: WOF records projected in QGIS.
Now that we have WOF records and data provided by the City of San Francisco, we can begin reconciling the two datasets. We will join the two datasets based on a common attribute; in this case the wof:name
field from the WOF data was joined to the SF OpenData's name
field. The join tool in QGIS can be found by navigating to the properties of the WOF .geojson layer and clicking the "Join" option (below).
Image: Join Properties tool in QGIS.
In an ideal world, all WOF records would join cleanly to SF OpenData records, but that is typically not the case. This join method worked for the most part, but because the spellings are not identical between each of the attribute tables, this join needs to be verified and improved by hand. For example, QGIS's join tool did not join a value of Haight Ashbury
to a value of Haight-Ashbury
or a value of Mission District
to a value of Mission
. As described below, it's not a matter of which name field is "more correct", but a matter of importing additional names from your authoritative source while preserving the existing wof:name
in a eng_x_variant
name field.
Alternately, we could perform this join based on location, instead of an attribute field. QGIS has functionality to perform a spatial join (some documentation here), which would be helpful if our WOF geometries were geographically similar to our administrative data. However, because our geometries in San Francisco overlap substantially with the SF OpenData geometries, an attribute join is more likely to give us matching records between the two datasets (generally, neighbourhood names are unique in city). If you are unsure of which join is best for your locality, give them both a try and compare the results.
Occasionally, Who's On First has duplicate records. Duplicate records should be noted in the mz:note
property (with a "duplicate", "dupe" or "dup" value) or with a 1
value in the mz:is_funky
or mz:is_current
property. If you come across duplicate features when comparing and joining records, prefer the current record over the record with a mz:note
of "duplicate", "dupe" or "dup" or a 1
value in the mz:is_funky
or mz:is_current
property.
Who's On First | SF OpenData | Note |
---|---|---|
Alamo Square | Alamo Square | in both, great! Let's import the new geometry! |
Anza Vista | Anza Vista | in both, great! Let's import the new geometry! |
Baja Noe | no match, no alternate name spelling, WOF only | |
Bret Harte | no WOF record, let's research. | |
Haight Ashbury | no name match, but does have alternate name: Haight-Ashbury | |
Cathedral Hill | Cathedral Hill | in both, great! |
Image: Comparison of WOF and SF OpenData name attributes.
This method assigned wof:id
values to each SF OpenData record that joined to a WOF record. After comparing, 96 of 117 SF OpenData records were assigned a wof:id
. With the records that did not join based on the name
field join, we will have to reconcile, adding the wof:id
manually whenever possible. This is done easiest by reviewing each name in your WOF attributes table with each name of your source data.
Image: SF OpenData attribute table after joining datasets. Records with NULL values need to be imported by hand.
Examples of neighbourhoods that WOF did not have at the time of import are Appearel City
and Buena Vista
(meaning these records will need a new WOF ID). Since these neighbourhoods were not in the WOF database, we should consider importing them as new neighbourhood records.
In this section, you will:
Remember - not all of the existing neighbourhood records matched to an SF OpenData geometry (96 new records were given existing IDs, but there were 165 existing neighbourhood records). This begs the question: What should happen to the 69 leftover neighbourhood records?
There are three options for the 69 leftover records:
parented_by
value for the neighbourhood it falls within. People still use this name, but only the residents of those few city blocks.Before importing the city-provided geometries, it is important to ensure the new neighbourhood boundaries will work in Who’s On First. While we can easily import the new neighbourhood geometries raw from our source (SF OpenData), we should "trust but verify" our data before the import.
The majority of geometries in the SF OpenData source were imported as-is, though four neighbourhood records in our San Francisco example were edited prior to import. Using our local knowledge and opinions, we adjusted these neighbourhood boundaries slightly.
After our WOF shapes were added to a QGIS document (below), they were given a new integer attribute field titled "status" based on concordance with our administrative source data. The status values were color-coded to display the following options for each of WOF neighbourhood records:
Image: Developing an action plan in QGIS by assigning records' status in QGIS, reviewing WOF record matches with new SF OpenData source. Colors represent status value.
Congratulations! You have finished collecting and evaluating new neighbourhood shapes for WOF! In the next part of this tutorial, we'll prepare the data for import. But first, a well-deserved break.
To finalize your work and prepare data for import, check out part two!