Dark data

Dark data, that data which is not ‘understood’, is typically 80% of organisational data according to a study by IBM. By 2020, they estimate this will rise to 93%.

Dark data arises when processes leave it behind, deposited in caches across the business.  Typical examples include:

  • Data residing in individual desktop spreadsheets;
  • Document or presentation content and iterations where version control has been absent;
  • Email attachments, downloaded, ignored and unmanaged;
  • Legacy data and systems;
  • Customer information that is past its sell-by date;
  • Unstructured data (pictures, diagrams, photos);
  • Files and notes generated by previous employees or relating to previous projects;
  • Analytics, reports and survey data;
  • Logs, dead account information and transaction histories.

Dark data may be junk, or it may be a hoard with potential value – you won’t know till you get to grips with it. The business case, however, can be made, that this is worthwhile:

“New York City’s Administration for Children’s Services (ACS) struggled with a problem common to many state and local departments and agencies: multiple sources of data, on various platforms, in various formats, of various quality. Analysts would pull from different sources to report on the same performance measures only to come up with different answers. Eighty percent of their time was spent searching and messaging data and only 20 percent of their time was spent analyzing and reporting on the data. They realized they needed to fix their data before analysts, managers, and executives would have confidence in the analytic results.” (Source: KPMG)