Content, the unknown ‘unknown’ ?

Do you ‘know your content’? Organisations typically only ‘know’ 20% of their digital content. Your searchable content may not be as large as Google’s, but 80% of your digital estate is still a lot of knowledge value going begging!

Sir Tim Berners-Lee proposed semantics as a way of applying logic to the vastness of the web. More recently and for similar reasons, to logically filter web content at scale, Google developed the knowledge graph. As the most visited website in the world, they ought to know, right?


Google indexes billions of web pages and has a market share of well over 70%. Interested yet? With numbers of that kind, a federated search isn’t the way forward. Plus, the volumetrics of unstructured content are unfeasible at scale> They can foul your network up and have an adverse effect on compute resources. Querying content at scale is also an operational time sink and so attracts the opportunity cost of time on waiting, Therefore searching content at scale is neither operationally nor financially effective. It is also too rigid.


Federated searches typically don’t rank results – they are just search queries, so the zillions of results still need to be sifted by a human being for relevance – massively inefficient. Human beings have physical limitations that computing tools enable them to overcome. If we program machines with the right logic in the right way, they can automatically process a greater breadth of material than a human being, and, far quicker. Even better, those results are filtered and ranked by knowledge graph technology.


Knowledge isn’t just the representation of facts – in whatever order – it is inferred by making connections and testing hypotheses. Structured data is more valuable when informed by semantic nuances extracted from unstructured data. The data is given meaning – context, relationships, lineage – all auditable. An organisational knowledge foundation is a powerful way to get meaning from all types of digital assets.


Machines can’t make leaps of imagination – only humans can. That is why semantics and natural language matter so much in opening up knowledge. If we are to stimulate invention & do AI properly, we need to use vocabulary that is meaningful to people, not just machines.


Traditional analytics tend to be retrospective: analysing historic content, rather than looking for new connections, new knowledge, or new value.


In the same way that you determine the relevance of search results, what about selecting content for archive or business-critical backup? If your content is in exabytes how do you decide what to back up? What is critical to the business and how is that defined and tagged? Storage has direct and indirect maintenance costs. You may be able to prioritise what you need to keep to make savings, particularly given computer supply chain issues currently & constrained budgets? The days of backing everything regardless, may be gone.


The corset-like rigidity of some enterprise applications and opacity of their data structures makes it difficult to take anything other than an application-centric approach which immediately fractures organisational knowledge and value. As an analogy, buildings that are now more resilient against earthquakes are those built with a flexible framework. A knowledge-graph would provide such a framework, capable of evolution over time to meet changing regulatory requirements, organisational change and conceptual growth. Rigid application constraints and process specificity mitigates against this.


With increasing proliferation of regulation, it is a wonder that any Chief Digital, Chief Data or Chief Compliance Officer gets to sleep at night. Not only do these regulations change, overlap, conflict (in some cases) or carry severe penalties, but if you do not have a cohesive, easily accessible view of your domain content, the task of guaranteeing compliance becomes even more of an aspiration than a certainty. Tagging your content semantically, according to the framing ontology, improves discoverability, governability and therefore enforceability of corporate policies. The imbalance between responsibility & accountability is then addressed.


The language of applications is often opaque – especially in ERP systems. With each piece of data given a unique reference, enriched by semantic-tags, you can be precise about that data and its meaning. A semantic model makes it possible to unify concepts across the domain, resolving what could be multiple identifiers (of key information, such as the concept of a ‘customer’) into a single, shared definition.

Preparing the foundations

Inventories and catalogs are useful in laying some of the necessary (structured) data and metadata ground work for knowledge graphing. They can help in getting to ‘know’ your content as the processes may also identify remediation targets. But they are best done when you have already thought through your organisations’ key ontological concepts; have some idea of potential semantic descriptors & a view on how you would weigh indicators. Otherwise you may have to reconfigure them. Also, classification is only part of the overall process of extracting knowledge from content. It’s just a way of tidying your digital cupboard, not an end in itself.

Generate meaning from the relationships & contexts of your structured & unstructured content. Use the power of knowledge graphing to rank the knowledge value of your ‘unknown’ unknowns. Prepare the way for real AI – automated reasoning.