Would your car run well on dirty fuel?
When considering data quality, the first principle should be that an incomplete record has no value. (It is arguable whether data that is not fully described deserves the title ‘data’ at all).
The first task therefore is to quickly and painlessly remove (or archive) incomplete or unusable data (where the referential integrity has been severely prejudiced) as these are only clogging up the works and will be carrying a considerable opportunity and performance cost, possibly as much as 80% of your data assets. That of course means that you need to understand your data; and be able to discover quickly, what data is actually in your packaged and other applications, so that you are in a position to make an informed decision.
What data jumbles remain – where the formats or the parsing is incorrect; where address data is not in the right fields; or where the metadata is incomprehensible (perhaps as the result of misguided input, where whole sentences were entered as metadata and subsequently chopped into meaningless chunks through several data migration exercises! and yes, that really does happen!) or simply ‘missing’ – are your next concern. The likelihood is that you will need to do at least some preliminary wrangling in order for data quality tools to have something they can work with. It has been known that data has simply been too poor to load!
To merge or not to merge…?
Your data quality tooling will reveal duplicate records where a decision will be required as to whether merger, de-duplication or determining sibling relationships is the best way forward.
Metadata & taxonomies
Another headache may be lurking in the metadata or taxonomies. If your data dictionary has 9,000 or more entries, then the likelihood is that that will need some attention too and that you should have devised a corporate ontology.
Transactional information, and accompanying documentation, has its own regulatory and referential requirements and challenges, as suppliers may have multiple sites to which invoices, purchase orders, and supplier personnel need to relate. And when considering uploads or migrations, there is a need to maintain the referential integrity of the transaction overall whilst also deciding what documents can be uploaded and in what order, in order to preserve that.
If you are using data quality tools, then before using them, some validation may be wise in order to evaluate what data model disharmonies there might be between the tool and the data to be processed. The idea is to check those constraints before putting through a huge data load, as this can represent a huge time cost, especially if the process fails – due to the size of the data. Using a validation checker will enable you to reproduce any constrains before loading the data, so that the tool is already aligned to the model. You want to avoid putting a square peg in a round hole!
Referential integrity is a key part of data value. So if you are migrating data, it is worth preserving this where it is feasible to do so. Historic data in obsolete systems may be painful and fiddly to retrieve but it could represent a considerable organisational asset. Notwithstanding that, some brave souls have made short term decisions about value which we hope they do not regret down the track….
Realising the value of your Cloud investment relies on putting value into it. Human nature being what it is, it is unlikely that errors will be rectified once data has been uploaded to the Cloud; nor will there be any appetite to do so. The result is that any data errors passed to the Cloud, will likely perpetuate, eventually adding significantly to your Cloud outlay. Cloud can store more and bigger data but also more and bigger rubbish, so it is important to attend to the data quality, at upload and as an ongoing task.
So blockchain, AI and ML all look very sexy – but putting aside the technical ‘do I or don’t I’ discussions on those for a moment, the simply truth is that to make them work for you, you have to feed them with quality data. Particularly in the case of blockchain, any errors should, theoretically, be incapable of correction due to the nature of blockchain itself. So they will perpetuate, again, theoretically, ad infinitum.
Whatever you want to do in terms of technology, don’t lose sight of the essential need for data quality. After all, you wouldn’t expect a car to perform if you put dirty fuel in the tank, would you?
Incentives for quality data
IDG Research carried out a valuable study on the Impact of Data Effectiveness for Sage in 2014. The study found:
“that improvements in the accessibility, usability, quality, and intelligence of data have a direct and positive impact on critical business outcomes: –
Companies with more effective data grow 35% faster.
Only 40% of companies rate their ability to process customer demands on the road as excellent.
More intelligent data means more revenue—a 20% improvement brings $9,216 more per employee.
Companies with better intelligence are 2.2% more profitable.
Companies with better intelligence are 4 times more likely to optimize inventory levels.
Companies with better data improve consistent quality delivery to customers by 9%.
Companies with more usable data increase productivity by 10%.
Companies with mobile access to data increase sales of new products by 5%.
Companies with mobile access to data sell 3% more to new customers.
Successful companies are 4 times more likely to process orders remotely.”
“companies with highly effective ERP systems that provide more usable and accessible data [were] more likely to realize these outcomes.”