Spatial data

As IoT develops, the dictum: “everything is somewhere” will become increasingly relevant. Locational data will abound, perhaps more than organisations will know what to do with. Those that do will have their work cut out as the challenges and risks of managing spatial data multiply significantly when compared to other data issues.

Since the advent of the smartphone, spatial data is no longer confined to describing geographic or terrestrial features. Spatial data represent the locational attributes of objects relative to the context in which they exist.

Spatial data can be either structured (vectors – the reference points of polygonal geometries and 3D) or unstructured data (rasterised items formed of pixels such as topologies, satellite imagery, scans, photos, remote sensing data). Spatial data can be mapped. Rasters work on grids; vectors on points. Add to this heady mix the notion of not just 3D, but 4D (time) and multiple attributes, and you can see that it begins to challenge compute resources. Certainly the days of 2D paper records have been left well behind. 2D records described the x/y coordinates of objects such as property boundaries. 3D mapping added height and depth to that: sub-terrestrial and super-terrestrial attributes such as coal fields or apartment blocks. The 4th dimension of spatial data is about the relative positioning of movable assets such as driverless cars.

Spatial data is therefore immensely complex with interrelationships across multiple dimensions and perspectives.

What’s different about spatial data?

The challenges of spatial data are an order of magnitude greater than those of managing data quality in normal business functions.

  1. There are many sources for spatial data. It is not unusual for there to be multiple datasets, each considered inviolate. It is as much a cultural and legalistic challenge as a technical one to determine what should constitute the absolute reference and golden record.
  2. The data may be ‘owned’ by multiple authorities. Without a single source of the truth, it is difficult for organisations and individuals to quickly, with certainty and unambiguously to establish complete and accurate spatial data for economic purposes and to securitise land assets.
  3. The data siblings may have been acquired over many years, with varying periodicity or frequency. This means datasets are unlikely to be canonical; the format inconsistent; according to varying standards of accuracy or parsing policies. It may need considerable wrangling.
  4. There may be inconsistencies between the vectors and rasters for the same location.
  5. It is difficult extracting structured information from the rasterised images, which are in themselves significantly large, and therefore consume compute power and disk space at a cost for speed of processing and data storage.  Raster datasets are big since they record values for each cell in an image, depending on resolution. This is why planning departments have traditionally been left untouched for many years, as the files in their proprietary systems are simply too large to easily manipulate.  Because rasterised images are composed of pixels, it is difficult to apply rules, classification or automation, and allocation of attributes is limited. The sheer size of rasterised images means early decisions on scalability and file size contraints.
  6. Continuous data (measurements, such as time series data), e.g. output from remote sensors, is not easily displayed as vectors. Topology combined with vector data is very rich but requires intensive processing. Any feature edits have to apply consistently to both topology and vector data. Managing a lot of features, or constant feature updates, may therefore require complex algorithms for vector manipulation.
  7. Spatial data, because it includes both structured and unstructured data (i.e. from the very large to the very small) is ‘lumpy’. This means that performance of impacted systems will vary immensely and that there may be periodic drains on bandwidth. This means care in setting up appropriately powerful databases and also makes it difficult to engineer performant cloud systems.
  8. Spatial data requires a lot of human resource intervention to decide on anything approaching a golden record, as dusty land deeds and historic (in all senses) references must be consulted. Whatever is decided must also satisfy the legal code.
  9. Spatial data is big data. Think of all the locational data that one human being generates during a normal day from their phones, cars (generating 350MB of data every second by 2020), public transport, smart utilities, IoT devices, apps etc.  You need serious horsepower to support spatial systems and analysis.
  10. Arising from this, there is also the matter of the varying audiences for spatial data. Taking land as an example, it is relevant to multiple interested parties: the owner, the puchasor, the solicitor, the registrar, those looking after tourism or ancient monuments and every one of those audiences requires a different perspective on the data.