Data-centricity

Data lies at the heart of every system or application. Data-centric enterprise architecture puts the data first rather than the application. Why? Because applications are introduced, upgraded or remaindered, yet the data is always there, circulating around the enterprise or its ecosystem.

Yet, data is typically the organisational ‘cinderella’. Organisations devote significant spend to building code or developing applications, but data… not so much. Organisations are far more accustomed to investing in the apparent tangibility of software or hardware, and less so in the very thing that makes everything else tick, and where there is a sound business case for doing so.

Change, OMG!

The transformation from ‘peer-to-peer’ systems requires a shift of perspective for organisations and for some, despite the rationale, it is simply a culture change too far.

Procurement constraints

Another contributing factor may be to this is undoubtedly the challenge of procurement. Making the case for a new CRM is more sellable as a concept than making the case for de-furring the corporate arteries. Where board members and senior executives struggle with grasping the fundamental import of straightforward technology for their organisation; the esotericism of investing in data is lost on them, with resulting detriment to the organisation.

Data-driven, into a swamp

A layer of confusion is further added by the belief that ‘data-driven’ is the same as ‘data-centric’. A data-driven organisation is one that is literally ‘driven’ by data; it acts on data. The manifestation of this is the deployment of many different applications and databases focused on their own data bubble. A data-driven culture is therefore necessarily internally, if not departmentally, focused, missing the big organisational or extra-organisational perspective. Further, it is a culture that is happy to spend on maintaining applications that will sooner, rather than later, become outmoded due to new requirements or advances e.g. you can’t easily re-engineer a packaged application to take account of GDPR requirements.

“Businesses want functionality, and they purchase or build application systems.  Each application system has its own data model, and its code is inextricably tied with this data model.  It is extremely difficult to change the data model of an implemented application system, as there may be millions of lines of code dependent on the existing model.” (Source: TDAN)

All that the data-driven organisation has to look forward to is an ever-increasing, and increasingly unusable, data lake. Data-driven organisations may have a thirstier appetite for data but as that grows, they are finding that big data really needs big data architectures, and that analysing the contents of a data lake requires them to revisit their data and metadata disciplines in order to get anywhere. Why? Because simply dumping different datasets together without first attempting to assess, understand or harmonise them in the glare of ‘does this make sense across the enterprise?’ means that a deal of work will be needed to extract any meaningful analysis. All you will have without this is a stagnant data swamp.

It’s easy to understand the attractivness and development of the data lake. ‘Put all your data here and we will take all your pain away by using ETL routines and getting the data scientists to look into the crystal ball!’  The problem with this is that it assumes a predesigned data warehouse schema, one antithetic to change. The evidence for this is that this approach is slow – fatally slow. Why? Because organisations must wait weeks, or even months, for any new data sources to be plumbed in.

“Each application on its own has hundreds to thousands of tables and tens of thousands of attributes. These applications are very partially and very unstably “interfaced” to one another through some middleware that periodically schleps data from one database to another.”(Source: TDAN)

And why is that important? Because until the data has been wrangled, it’s not capable of being used for analytics, particularly mash-ups or exploratory analytics hunting for data diamonds in the haystack or trying to decide whether it’s worth doing the wrangling at all!

The killer is that the avalanche of new data will soon overtake whatever wrangling or ‘ interfacing’ is being done, clogging up the works with more dark or poor quality data until the whole thing grinds to a halt!

Data-centric

The data-centric organisation is one that is outwardly-focused, looking not only to its customers and enterprise, but towards its wider ecosystem and the data that feed them:

“Data centric refers to an architecture where data is the primary and permanent asset, and applications come and go.  In the data centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone.” (Source: TDAN)

Here, the overarching data model is critical so that whatever applications, programmes, apps, functions, processes or code need the data, everyone is singing off the same hymn sheet.  The data-centric organisation will have “a set of concepts and categories in a subject area or domain that shows their properties and the relations between them.”  What is then established is an ‘information framework’ that enables ‘science’  ( Latin: scire = to know) to happen according to agree corporate principles. The emphasis has shifted from enterprise infrastructure and application architecture, to architecting the data itself.

Putting data-centric into action

Unless you are fortunate enough to have a green-field space, the likelihood is that becoming data-centric will have to be done incrementally; moving away from data chaos to patterned data until the data swamp is reduced to a puddle.

A significant task will be tackling the ontology hierarchy which flows from governance, risk management, audit and policy considerations down to operational level,  eventually manifesting as field-level metadata. (For instance, new metadata structures will be required for new regulatory imperatives, such as GDPR.)  The cross-referencing implicit within the framework itself should help with the task of ascertaining the ‘unknown’ or what might need to be known in the future.

And providing there is at least a skeleton framework in place (distilled from board directives, corporate policies, charters and GRC considerations and oversight), the model can be developed over time, so you can work with what is in front of you, agilely.