Data quality - digital metamorphosis

Why data quality matters to the business

Poor data quality processes are corrosive

Poor data quality, taking a reactive, rather than proactive, stance to data errors is a massive own goal. DQ negatively impacts the business so that:

resources are diverted to disproportionate & avoidable data maintenance activities – 80% of data scientists spend their time on DQ;
data quality reports are time-consuming and out-of-date by the time they are produced – requested reports to senior execs compromise decision-making;
the ingestion of external data compromises clean internal, impacting the entire domain – postal address data for instance;
poor postal address data (typically bought in) causes cost-out, downstream inefficiencies – in the US, as much as $14m per annum;
KYC & AML initiatives can be inadequate if they rely on obsolete or unverified data where they can also trigger avoidable associated print and postage costs;
reputational risk arises from significant lag between data creation and imperative external deadlines.

Financial sector

Financial data, trades & transactions, are highly regulated and subject to time stamping. Delays and mistakes are not only corrosive, they have much more immediate and damaging implications.

The European Securities and Markets Authority (ESMA) noted in their annual data quality review (“EMIR and SFTR data quality report 2020” Pg. 9): “Reported issues include, among others, abnormal/irregular values caused by counterparties misreporting, access to current and historical TR regulatory reports, different kinds of TR related issues, and problems with data reconciliation.” Not only did abnormal values indicate governance & reconciliation issues, comparative analysis of archived reports can be a challenge. ESMA also noted that the presence of abnormal values could “introduce significant biases to any economic/financial risk analysis relying on the data”.

Health sector

From experience of a DQ project in healthcare, there are immediate benefits from getting services in place alongside tooling. The tool of choice was suited to the client’s existing IT environment, Oracle Enterprise Data Quality. Not only did they solve their DQ issues, but they enacted workflow processes that put the time-consuming burden of parsing onto data input personnel instead of on data stewards. Cleaner data meant this agency no longer sent out important information to unvalidated (and frequently-changing) addresses. Obsolete addresses used to generated significant print and postage expenditure. In addition, workflow benefits included less lag between data creation and usage, thanks to integration with the CRM.

Begin with a DQ review

There are a range of data quality tools to choose from, depending upon the size of your domain, budget and the existing environment. This choice may be informed by a Data Quality Review. Regardless of whichever tool you choose, you will still benefit from DQ services information architects who understand how to optimise the use of that tool; position it within your existing data workflows; and integrate it with other applications or systems. We have mentored much larger incumbents through installation of DQ tooling, and ensured the integrity of the deployment. We have advised clients on the best way to integrate the data quality system with their existing environment and headed-off potential problems. The wider DQ environment matters and we have the expertise to assist you with that, including product-specific skills where applicable.

During a Data Quality Review we would profile, grade & report on typical data issues such as:

de-duping;
sibling relationships;
classification;
parsing & format consistency.

The aim is clean, consistently formatted, valid data & metadata for onward information engineering, provisioning and curation capable of supporting BI & AI initiatives.