The risks of undiscovered data
What data do you, don’t you, have?
Experience teaches that organisations should consider data discovery services if there is:
- Data they have but don’t understand;
- Data they don’t know they have;
- Data they need but don’t have;
- Data they have but don’t govern;
- Legacy or acquired data that has not been scrutinised.
Do you have data issues?
Since organisations typically have upwards of 80% dark data (data that is unknown to them, and not understood), can you be confident that your organisation is aware of these common data issues?
- 3rd party, legacy or ‘acquired’ data may contain unpleasant surprises;
- Legacy data is more valuable if it retains lineage and relationships?
- Legacy data may need to be disentangled from legacy rules or code;
- Duplicate data may have inconsistent definitions;
- Poor metadata will prejudice semantic integrity;
- Data that lacks meaning, provenance or context lacks value;
- Obsolete entries;
- Data held without consent;
- Data that should have been deleted or archived can adversely impact live systems;
- Metadata may be unsuitable for current business needs;
- Multiple instances of single reference point data will prejudice any AI initiatives.
Data discovery minimises the impact of data issues
In the age of GDPR companies are expected to know what data is held in their domain. The risks are clear:
- non-compliance fines and reputational damage;
- poor BI, MI or AI insights;
- business effort undermined;
- non-performant systems;
- unplanned effort & resources requiring unplanned funding; or
- digital project stasis.
Why data discovery services are still advisable
Definitions of ‘data discovery‘ vary enormously, particularly amongst tool vendors. This matters because the scope creep beyond the core ‘discover the data’ task, has the potential to derail organisations from their business objectives without reference to their digital strategy or environment.
Whether you intend to use search-based or visualisation based data discovery tools, the characteristics Gartner states are shared by both approaches require careful thought:
- the use of a ‘proprietary data structure to store and model data gathered from disparate sources, which minimizes reliance on predefined business intelligence (BI) metadata” may be appealing as a quick fix, but will eventually need re-visiting to align those metadata to the wider ontological environment;
- the use of “RAM or indexing that lessens the need for aggregates, summaries and pre-calculations” may not be suitable for extreme data environments, and our BI experience is that pre-calculations and aggregation are still vital filtration techniques in optimising BI & analytics.
Whilst discovery tools may be attractive you may still benefit from our data discovery services: identifying, extracting, profiling & assessing relevant, critical data & metadata required by the business in the digital pursuit of its strategic objectives.