Data strategy: curation

Curation is actively maintaining the value of data (much like an objet d’art) over time. This involves securing the integrity of the record, its audit trail and provenance, and continuously managing the data through its lifecycle:

  • from creation/aggregation/ingestion through accession and organisation to eventual archiving for posterity, or purging for reasons of privacy or obsolescence;
  • ensuring that it is reliably retrievable, with record integrity, audit trail and provenance intact;
  • safeguarding reusability and state – in some instances, it may be appropriate to archive raw data upon ingestion – e.g. for financial services – to preserve the data at every step, through every change and annotation, however small;
  • handling data masking, abstraction, redaction, disclosure and encryption as these relate to enduring (but mutable) security and privacy requirements;
  • handling governance imperatives e.g. following GDPR, data destruction, erasure, and auditability thereof, for the same reason.

The curator may be dealing not only with endogenous data (including potential uses of dark data or data swamps) but also collecting data from other and new sources with a view to blending, aggregating, validating, reconciling and re-combining or re-presenting data as a richer and more valuable information source e.g. data mart and for more insightful analytics.

Curation also covers all the processes relevant to principled and controlled creation of data, its management and maintenance. Curation should deliver sustainable data quality as this is essential for the reliable onward use of the data in scope: preserving its integrity throughout its lifecycle.

Curation tends to be human, rather than machine initiated.