Data Vault and Data Mesh - Can they really work together?
- Andrew Griffin
- Jan 5, 2023
- 4 min read
How to get the best of both worlds by combining Data Vault and Data Mesh
Data Mesh has been the buzz word in data analytics over the last 12-18 months, but does the latest fashion for a distributed architecture and federated governance mean the use of the Data Vault methodology is threatened? Working examples of a full Data Mesh implementation are still the exception rather than the rule, with Data Mesh advocate Zhamak Dehghani‘s ‘Delivering Data-Driven Value at Scale’ book launch in early 2022. Interestingly, the most prominent example so far has been implemented by Swiss pharmaceutical giant Roche. Roche Diagnostics has retained the strengths that a Data Vault brings – while finding a way to save millions on inventories and resource optimisation by embracing Data Mesh. Roche‘s head of data management platforms, Paul Rankin, explained the pharma giant‘s thinking behind marrying a Data Mesh with a Data Vault when he presented at the latest Data Vault User Group meet-up. Paul posed the question:
Data Mesh and Data Vault – Can they really work together?
He explained why Roche was seeking an alternative way of carrying out their data analytical functions, and what a Data Mesh offered. Roche‘s management team were increasingly frustrated by their legacy solution. The pain points typically meant it would take up to three months to scale-up and increase the compute for any major changes to its data analytics set-up. It took three-to-four months before any changes could be implemented resulting in delays to market insights and value gained from any new releases.
Monolithic Data Warehouse bottlenecks and delays
Paul revealed that two or three major incidents or outages a year were quite common
– either because of pipeline problems, or corruptions within the database.
Four years ago, Roche‘s classic business intelligence set-up was both monolithic and very hard to change. Nearly all data sources were in-house for the multi-national healthcare company. The multiple servers central to Roche‘s operations were hard to maintain, and very slow to scale-up. There were bottlenecks within the central IT team affecting every part of the business at some time or other.
Simply switching to the Cloud four years ago failed to bring the expected benefits, while giving the unlimited storage capacity and increased computational power it did not address some of the underlying issues.
These issues left big question marks over the reliability and performance of Roche‘s data management approach. Paul‘s boss, Omar Khawaja, global head of BI diagnostics, is a firm believer in Albert Einstein‘s old adage that the definition of insanity was “doing the same thing over and over again – and expecting different results.“
As a result, 22 months ago Roche took the rather revolutionary approach to build a Data Mesh. In the “Brave New World”, business domains became responsible for their own data products. The domain-orientated ownership was key to the paradigm-shift Roche committed to. Apart from the big change in mindset regarding ownership of those data
products, the next challenge was to build the self-service data platforms, which required construction of pipelines carrying data from source to consumption. In the process, the number of data users at Roche increased 10-fold.
Paul advises anyone that is looking for a Data Mesh implementation to do two things:
1. Test how mature your business is to make sure it is the right solution
2. Start small with one or two parts of the business – such as production or finance.
The latter point will ensure a buy-in from the rest of the business, as the advantages are seen and shared.
What does a data product team look like?
Based on Roche‘s experiences, a data product team can have
10 to 15 members, but can be as small as 3 or 4. A typical team is cross-functional with a data architect, a data product manager, a data product owner, a leader on governance, plus 3 or 4 developers and data engineers.
As a result, Paul said: “You will have the skills and the autonomy to do exactly what you want, whenever you want in your own data product team.”. The data product teams at Roche are now responsible for ingestion and the end-to-end build. Each team can create their own satellites and hubs within its Data Vault artefacts, which are data products in themselves. They can be reused – and other domain teams may find value in them too. But they can also be combined with other teams’ data products, creating a master data customer data product. Paul revealed he had discussed this concept with Data Vault inventor, Dan Linstedt, at last year‘s World Wide Data Vault Conference in the USA. “Initially you would think it goes against the Data Vault model, but it‘s not just a distributed model. Instead, it’s distributed ownership with artefacts owned by different teams,” Paul told the Data Vault User Group. In fact, Dan Linstedt said he had used the same solution in some of his own user cases – but did not include this in his book ‘Building a Scalable Data Warehouse with Data Vault 2.0’ simply because it added to the complexity of what he was advocating and would not be applicable for everyone.
What does the future look like for Roche?
Roche already has 330 data product teams producing 50 data products, consuming nearly 600 terabytes of storage for the 2,680 developers and users
involved.
The biggest challenge remains; a reluctance to own data products because it creates more work and adds to costs in the short term.
The governance of shared data is still imperfect, so Roche is looking at ways to incentivise teams to publish more data, possibly by giving more IT budget to reward domains that push on with the “Brave New World“ of Data Mesh. However, the successes far outweigh any shortcomings so far, Paul stressed. Roche can point to improved:
Data quality
Accountability
Agility and speed of delivery
Data silos are falling away, and each platform team can now focus on delivering capabilities, standards and accelerators – and the strengths of Data Vault are central to delivering those improvements in a well-coordinated way.
Summary
Paul was yet another great presenter for our monthly Data Vault User Group Meetups. Sign up for our next meet-up here. Juha Korpela is presenting “Capture your business needs with conceptual Data Modelling” on the 18th of January 2023. You can watch Paul Rankin‘s full presentation above.