A conversation about if it's a good idea or not :)
I'm interested in increasing the amount of UPRN data held within OSM.
I know there are some sensitivies about both UPRNs (given UK licensing restrictions on address data) and bulk updating. So I thought I'd ask here for a sense check on my methodology before I attempt to make any changes.
Before I started
- I read the extremely helpful ref:GB:UPRN wiki page.
- I used Rob's GB Unique Property Reference Numbers (UPRN) tool to examine UPRNs in my area.
- I undertook to add all building footprints in a small geographic area (I focused on a UK electoral ward, a small geographic area of <5000 buildings).
I noticed on Rob's UPRN map that many of the building footprints I added have a 1-to-1 relationship with UPRN nodes. That is, many UPRN points were a single node enirely surrounded by a single building footprint polygon.
There are a number for which this is not true including:
- Apparent non-buildings, UPRNs appear to be assigned to some utilities or other non-residential features. I'm not interested in these for now, so am ignoring them.
- Apparent buildings with more than one UPRN node inside. These are interesting, they may indicate Housing with multiple occupancy, or sites where previous buildings have been removed and infill housing built on top, or just an error in my building footprint location. But there are not many of these over all, so for now I'm ignoring them.
Targeting these I:
1. Took the latest UPRN statistical release National Statistics UPRN Lookup (April 2022)
2. I noted the document "NSUL User Guide Apr 2022.pdf" inside describes the UPRN data as "derived from AddressBaseTM Epoch 91 with an OS refresh date of 3 March 2022." and that the licensing terms for this data is the Open Government License "Our address products (derived from AddressBase®) are subject to the Open Government Licence." connecting this back from their licensing page seems to be v3.
3. I took the UPRN data, and a geofabrik download of the county, before slicing out the electoral ward based on a geogjson polygon extract from mapit using osmium.
4. I filtered the extract down to just polygons with the `building=*` tag.
5. I imported my UPRN data as geojson into QGIS (learning about projections along the way... I took a punt at `EPSG:7405`, which seemed to align broadly with what I'm seeing on OSM and Rob's UPRN map
6. I used QGIS to select all polygons which a surrounded a UPRN node, and then filtered these by a count. Drawing out a layer where there was a clean one-to-one relationship between building footprints and UPRNs.
7. Finally I excluded any buildings that already had UPRNs applied so as not to overwrite any existing data.
The result is a GeoJson export from QGIS with UPRN data applied to ~3500 buildings within a ward.
Things to sense check
- I'm pretty sure I'm good on licensing, as the only data inputs here are UPRN data from the ONS which is Open Government License, and building footprints derived from bing arieal images and surveys. Can anyone see anything here I might have missed that would prevent an upload?
- Data quality could be an issue here, however I'm reasonably confident from a sample that the one-to-one relationship has isolated fairly simple cases, errors here (perhaps based on poorly located footprints, or projection errors) are likely to be small and could be validated later against other UPRN sources and perhaps addresses. If only to highlight errors.
- This would be a bulk update, I know we generally don't like these for good reason, but the area is small and easily revertable. In addition the feature here is not something we can gain by survey and so the only alternative is a manual tag-by-tag version of the above methodology.
- Are UPRNs even a good idea in OSM? They're not a physical feature we can survey? I feel pretty strongly in the "yes" camp here. The theory is that these are a unique ID that can facilitate joining OSM data to other datasets, which unlocks a bunch of useful applications, raising the utility and relevance of OSM building footprint data.
- I'm trying a pilot to join UPRN data to building footprints, just based on what nodes fall within polygons.
- I excluded the more difficult cases, and have about ~3500 buildings I'm confident tagging
- Interested in thoughts on if this is a good idea?
- If folks sound positive, I'd trial it on my area, before writing it up and thinking about if we could try this in other wards, or on a wider geographical area.
I hope this is an appropriate place to put this, I went around the houses a bit, I started out here (https://help.openstreetmap.org/) but after reading the FAQ wasn't convinced it was the right place.
I'm always a bit intimidated by the mailing list... so figured I'd try here
I've run a test with the API on a single node (tried to use the dev servers, alas the data doesn't seem to be up to date), seems to have been successful.
I've put my extremly crummy python code up here.
Exactly what I had been thinking, with the same conclusions.
Trying to think of glitches.....
Maybe it is dependent on the quality of the building plotting?..the one-uprn-to-one-building may successfully prevent false positives.
Non-split buildings (lots of semi-detached pairs are plotted as a single building) will reduce the number of matches.
That looks like a great way forward. Just a very basic question, how does your solution deal with existing tags for ref:GB:uprn? The reason I ask is that sometimes the published UPRN data is wrong (e.g. allocated to the wrong building) and has been corrected in OSM using local knowledge. So will your solution duplicate or delete these?
Item 7 above reads "Finally I excluded any buildings that already had UPRNs applied so as not to overwrite any existing data."
Just a thought, will your solution use the 'source=*' tag?
I do think that UPRN's need to be within OSM.
Couple of thoughts:
Is there a rollback process available?
Is there are test or pre-production environment available for inspection of the work?
The natural expectation can only be that the whole of UK will be added in time in some sort of rolling process. Is there a programme to do this available?
Is any QA/QC process being applied to ensure that the data applied to OSM is validated?
Looking at the code - I am not a Python developer but I have written a lot of blocked code - Line 18 is a for loop which continues to Line 29. First part of the block is an ErrorCheck which drops through to the remaining part of the block. The found Error is not handled. Is this my misunderstanding of Python or a code issue?
Yes to UPRNs in OpenStreetMap - when we established the tag the main concern was adding UPRN tags on their own (i.e. just a node with the UPRN as the sole tag). This is not the case in your proposal here.
Licence is ok too so long as you are getting UPRNs from either the OS Open Data portal or the ONS NSUL dataset. Both are using the Open Government Licence.
Bulk import - fine if done well. Sounds like you are thinking things through and taking your time (which is a good sign). Allow time for comments here in case others think of something else and don't forget to follow the guidelines at https://wiki.openstreetmap.org/wiki/Import/Guidelines
Quality is a hard one. Nothing is perfect including manual edits. You need to be confident that the edit is of sufficient quality and demonstrate steps to this. The one thing that comes to mind that is not mentioned above is what happens if the building data in OSM is not accurate and has an offset? OSM UK host the cadastral tile layer which you can use to check the locational accuracy of the buildings.
Wow so much great feedback, thanks everyone.
Let me summarise a few points and some actions for me.
@Jez Nicholson yeah good questions all. I have a degree of confidence in this as I added this entire wards worth of building footprints myself. All traced from bing but with an eye on the cadistral parcels too (I mapped all with an offset and checked a central road with GPS traces and my own gos survey)...
I'd want to think harder about how to manage lower confidence building footprints, but it could be interesting to look at cadistral parcels as another filter... Looking at a one to one relationship (easy wins again)?
@Nick Ananin great suggestion on source. I'll go back over the python api and work out how to get source in... I think it's a property of the changeset.
@Tony Shield yep good questions, on rollback I have the way IDs of all footprints I'm doing. I'll keep those for a period (lets say six months?) To roll back i'd just need to clear the uprn tag from those nodes. I'll also put it all in ok a single changeset. I'd hope a changeset could be reverted.
I tried using a non prod env to test but I couldn't get the dev server requests to work via the python wrapper. I was getting 404s so I don't think data is up to date. One option would be to create a local instance osm, provision only my area and then execute the code against that. However I'm a little tempted to just go for it in my single ward and then revert and make tidy edits... It's not ideal but I ajve confidence in the rollback plan worst case.
Q/A and quality is a good question. I'm mapping quite intensively in the ward and intend to work with the UPRN data. So I should spot any major errors and be in a position to correct. I'd look to do a spot check validation against Rob's UPRN map and debug from there. Beyond that... I'm tempted to think we'd need a check against another dataset... Which all have licencing woes.
And Tony you are spot on about my python, I only put a message in the if statement (turned out everything was a way in my JSON anyway) but I can tighten that up to do proper error handling.
So I'm gonna propose I go for it with my one electoral ward. Assuming that goes ok and I don't revert. I'll write up what I've found (which buildings that leaves untagged, interesting edge cases, what was easy and what was hard...)
Then perhaps I'll see if I can write some of this into a script that others could test on an area they are familliar with.
Far down the line and with much more brainpower we could look at what a solution for the whole UK could be 🤔. Though I'm tempted to take this very slowly.