De-duplicating Who’s On First venues with vector embeddings

Using four different Who’s On First venue repositories for testing, I have been able to first deprecate about 45,000 duplicate records and then, second, derive over 100,000 concordances with Overture Data place records, 8,000 concordances with All The Places venues and another 500 concordances with ILMS museum records. There are almost certainly still bugs, or at least “gotchas”, but importantly the work so far passes the “better than yesterday” test.

This is a blog post by thisisaaronland. It was published on August 16, 2024 and tagged venues, download, whosonfirst, wof, data, overture and alltheplaces.

Privatezen

The first week I started at Mapzen, in 2015, I remembering thinking I wonder if I can swap out each one of third-party services used by Privatesquare with an equivalent Mapzen service? The answer, at the time, was “No”. It was a useful reminder of the work we had set out for ourselves.

This is a blog post by thisisaaronland. It was published on February 02, 2018 and tagged electron, mapzen, privacy, privatesquare, sqlite, venues and whosonfirst.

Venues, Postal Codes… and All Those GitHub Repositories

Multiply "a lot of venues, even in the smallest of communities" by the "entire planet" and you’ve got… well, a lot of venues.

This is a blog post by thisisaaronland. It was published on October 07, 2016 and tagged whosonfirst and venues.