top of page

Refactoring Data Vaults with Ontologies

  • Andrew Griffin
  • Jun 27, 2022
  • 3 min read

Using Ontologies to refactor Data Vaults


It’s a question bounced across the boardroom tables around the world… how can we turn out information contained in a data platform into meaningful business intelligence and insights, and more importantly how much will it cost.


Well according to Richard Strange, who is currently completing a PhD on using artificial intelligence to help predict earthquakes, you can start such a transformation project relatively cheaply.


In fact, he reckons it can be done for the price of a box of stickies, a whiteboard, tea, coffee and biscuits, plus a mobile phone to take some pictures.


So while the answer is not a lot, what you do have to invest in, Richard stresses, is a lot of time talking to your business users in order to create a domain ontology / taxonomy of how your organisation or enterprise operates.


So, how do you go about defining the semantics of your business or organisation – the process is the same for both.


Similarly, user-view ontologies – see the example below on using a medical bioethics structure – are not what is required.


You have to create a systems-approach to reflect a source-system Data Vault in any meaningful way that will add value to your business from your data.

In making your own domain ontology, it is important to remember that entities are real – even if they are abstract concepts – but you cannot capture everything in an ontology, so it needs to be strictly scoped.


More importantly your ontology will change as you learn more – but equally as relevant – as the business changes, so track changes to the ontology.

Richard’s simple four steps are:

  1. Define the scope of your ontology

  2. Research your business text, interviews and existing frameworks – 50 terms is a good start, but make sure they are defined

  3. Order the entities into a hierarchy from general to specific definitions

  4. Reorganise the ontology by readability and logical coherence.


Richard then gave some examples from the work of volcanologists around the world observing volcanoes and the methods used to create the most effective ontology to analyse the reams of data collected from the observatories.

He then reminded the room that source system Data Vaults lack semantic understanding and have poor integration, and run against advice of creator Dan Linstedt.


As a result it does not follow the business so to refactor such a Data Vault, get the business users to talk through what the crucial entities are, via semantic workshops, who create their whiteboard – photograph it each step.

That’s where those stickies and the refreshments come in.


Aim for the “low-hanging” analytical fruit first… if your project creates demands for standard reports, then do that first.


After finding your first exploration link you can backfill the refactoring work upstream in the raw vault and staging area, before repeating the process.


Familiar work patterns for any right-to-left Data Vault build will emerge as you find the next interesting exploration link, or critical report, before moving on to the next semantic area.


Reconciliation testing is vital as you check your data model against the semantic diagram.

Having completed that work, your business or organisation should be ready to find new information via data analysis.


Watch out for whether the data model has truncated any important part of the business model, or for unexpected connections between non-adjacent business areas affecting each other.


You should now have a situation where data engineers can create an exploration link and mart to answer questions from the business, allowing the data scientist to play with that data, leading to the creation of a final model which can be trained and assessed.


Database vectorisation is also possible – allowing algorithms to operate on a set of values rather than a single one, permitting for textual analysis of a collection of business documents using PCA (principal component analysis) of key words.

Watch Richard's full presentation above.

bottom of page