De-duplicating Who’s On First venues with vector embeddings
Using four different Who’s On First venue repositories for testing, I have been able to first deprecate about 45,000 duplicate records and then, second, derive over 100,000 concordances with Overture Data place records, 8,000 concordances with All The Places venues and another 500 concordances with ILMS museum records. There are almost certainly still bugs, or at least “gotchas”, but importantly the work so far passes the “better than yesterday” test.
This is a blog post by thisisaaronland. It was published on August 16, 2024 and tagged venues, download, whosonfirst, wof, data, overture and alltheplaces.