Data Mesh and Data Vault – Never the Twain shall meet?
- Andrew Griffin
- Jan 16, 2024
- 4 min read
How and why you should look to a Data Vault so your Data Mesh can deliver what it’s meant to
On the face of it, Data Vault 2.0 and Data Mesh are addressing different problems.
The former is centralised – both from the way data is transformed and loaded, and who can access it – while the latter is decentralised with individual domains within a business responsible for both how, and where, data is captured, and equally important, who can see it and use it.
Data Mesh has been much talked about in the data analytics world for the last couple of years, attracting the interest of more and more organisations seeking to avoid central bottlenecks and empower end user department.
But has it become something of the Emperor’s New Clothes, with many businesses claiming they have adopted a Data Mesh, when in practice, they have partially adopted it.
It’s important to remember there isn’t an out-of-the-box solution available in the marketplace.
Data Vault User Group chairman, Neil Strange, considered the requirements needed for a successful Data Mesh implementation, concluding that far from having too many contrasting and conflicting differences, Data Vault fits well into a Data Mesh framework to underpin a successful implementation.
To start with, both Data Mesh and Data Vault, operate within the analytics plane – as opposed to operations – concentrating on business reporting, decision-making, analytics, data science and forecasting rather than actually running the business.
How do you define a Data Mesh?
Interestingly, if you try and find an actual definition of what a Data Mesh is, you will find organisations from Amazon to Microsoft, producing quietly and wildly different descriptions, and certainly not in any short or simple way. The criticisms levelled at traditional business intelligence solutions is that they:–
Deliver too slowly to meet business needs and often fail to understand the business in detail
Create centralised bottlenecks
Require too high skill levels and needs deep specialisms to work with data
Stifle innovation
Frequently fail because of schema drift and poor input of data.
So is there an alternative?
Data Mesh aims to:–
Decentralise data
Make data and the platforma service – reducing skill requirements for users
Remove administrative barriers
Empower teams to focus on their data requirements and business needs
Strengthen governance to provide control and avoid anarchy.
It can best be summed up in O’Reilly’s book:
So while Data Mesh’s scope covers four key concepts – domain ownership, data products, federated governance and the use of a self-service platform, Data Vault’s scope is more closely focused on data integration and data products.
What is a data product exactly?
Neil’s presentation then focused on data products in detail, explaining that teams without a centrally enforced standard, could choose how to build their products – on everything from Excel to Snowflake, utilising Kimball, data lakes, or even a Data Vault – the realistic aim is to utilise it in some domains, particularly those that require more maturity and rigour the latter can bring to a data product. Patrick Cuba, author of the book 'The Data Vault Guru', has suggested in a recent article that a Raw Vault and business vault can be constructed inside the domain, with the data exposed at the far end as a data product. There is nothing to prevent a landscape in a typical data warehouse – which produces anything from 5,000 to 20,000 reports – being replaced with a small, very focused handful of data products that can still serve corporate reporting needs.
Thus you can avoid mass duplication of many data sets by different arms of the business. This can be achieved through a network of data products – many of which can be shared by the different arm of the organisation.
By using a data product catalogue users can search for and be shared through a data marketplace and different data schemas for each product to avoid moving large files around and controlling permissions to view them.
Data products can be more selectively managed and invested in – and if well designed, can comfortably replace say 60 reports in a traditional reporting set-up to something more like half a dozen, while still meeting corporate reporting requirements.
Data Vault takes care of integration
Neil argued that if you are creating data products, you might as well use a Data Vault, which will take care of the integration of the data from various source feeds, building a point in time history, and an audit trail. It also allows for multiple projects to be cut. One thing that quickly becomes clear in a Data Mesh is that there are a number of data elements, which repeat themselves across different domains. With some definition of the components that make up those data products, an integrated data model can be used to help share those common data components across the domains and achieve consistency in the results. Neil also gave examples of how his work at the UK government’s Department of Education more than a decade ago had shown how a system that can exchange data easily, can be created to help with interoperability and integration. So in conclusion, Neil believes a domain-based Data Mesh can also work effectively by incorporating a Data Vault to provide the semantic integration that can help drive standards in the framework.