The things I wish I knew before I started my first Data Vault Project
- Andrew Griffin
- Jan 7, 2021
- 5 min read
Our panel of industry experts gave their top tips for those embarking on a Data Vault project hear from Iliana Iankoulova and Ana Narciso of Picnic, Michael Magalsky of InfoVia, Gabor Gollnhofer of Meta Consulting, Dirk Vermeiren of VaultSpeed and Neil Strange of Datavault
Some have attributed the above quote to American literary great Mark Twain.
In fact it was penned by another American – not in the 19th Century – but by Life’s Little Instruction Book author H Jackson Brown in 1990.
Plenty of people have embarked on building an enterprise data warehouse from scratch – or constructed a replacement for a legacy database over the ensuing decades. But how many of them would like to turn back the clock and start again, or venture down a different route – especially by making better use of Data Vault, and more recently Data Vault 2.0 methods?
Hindsight is of course a wonderful thing. And the pace of change in the workplace – with digital transformation and all it entails – means there are still scores of opportunities to transform your company’s data analytics. So it makes sense to benefit from the lessons learned by pioneers who went before.
‘The Things I Wish I Knew Before I Started My First Data Vault Project’ was the subject of the latest Data Vault User Group meeting.
The presentation from five experts focused on the three most important lessons gleaned during their “Data Vault journey.”
Speaking from Hungary, Meta Consulting’s Gabor Gollnhofer has been building enterprise databases since 1992 and his first data warehouse in 1996. Having worked in everything from finance, banking and insurance, to telecoms and education, Gabor warned that Data Vault is not a silver bullet.
The need to engage end-users and developers in the process requires the project team to convince them and demonstrate the benefits of using Data Vault 2.0 today. That requires good training on how to “use and not abuse” the concept – all of which takes time – the one commodity many businesses and organisations believe they do not possess.
Modern enterprises, able to capitalise on the benefits of the cloud from day one are in a different position to companies – where the challenge is as much to deal with the past as much as the present.
Online Dutch supermarket Picnic is driving down costs in the race to be the most competitively priced in the sector with free delivery. Picnic Product Owner Ana Narciso and Tech Lead Iliana Iankoulova spoke about the lessons they have learned using Data Vault. Their very comprehensive step-by-step explanation of how Picnic works – featured key elements of how they use a Data Vault to drive their operations. With 120 data sources and nine data engineers, Picnic also employs 10 data scientists and 150 analysts – out of a workforce of more than 5,000. The team monitor a delivery system that supports more than 100-plus drivers on the road, with a data warehouse that is converted into 960 Data Vault tables and 9,480 columns!
Their cloud system takes care of everything from purchasing to stock picking from the physical warehouses, to tracking and resolving routing for the individual deliveries using an all-electric fleet in the Netherlands and Germany. Low-maintenance automation helps deal with thousands of jobs every day – and having raw source data in the database helps with debugging.
The Picnic system is designed to have no code duplication with pipelines only requiring extraction code and target table configuration.
A key lesson is ‘Build your Data Vault for the future, without predicting changes.’
It took a year to construct. But data collected over a longer and longer period helps make sense of seasonality in the food sector – adding value to the business. By deferring any business logic in the data platform to the top layers and avoiding locking yourself into a badly design vault with too many complex links, your project will reap the profits further down the line.
And by using Data Vault, historical data is both fully auditable and recoverable in the event of back end problems, or other emergencies.
infoVia’s Idaho-based Principal Architect Mike Magalsky believes the business value of creating a Data Vault, coupled with high quality security, is the key driving force. Utilising automation and metadriven code generation will resolve many pitfalls of such a project, in Mike’s experience.
Leveraging automation and iterations – repeated use – to build a working model will overcome many obstacles and produce better results. And using the layers of the Data Vault to manage business rules will ensure business value and efficient data delivery.
While the use expert advice will avoid costly mistakes, Mike stressed.
Vaultspeed’s Dirk Vermeiren – the Chief Technology Officer of the Dutch firm responsible for a data warehouse automation tool – echoed many of the sentiments of the other speakers. Having started in 1993 with a project for a major Belgian bank, Dirk has a dozen years of dealing with large Data Vault projects.
Taking longer to build a Data Vault – but insisting on using an automation tool to deal with the difficulties of writing ETL code – will be pay back dividends 10-fold in the end.
The raw data level has to embrace what Dan Linstedt calls the single truth of the data, Otherwise constant business changes – which will change the raw data over time – will require major re-engineering further down the line. That will prove both costly and be prone to errors, while also making auditing and tracing even more complex.
Neil Strange, the CEO of Datavault, chair of the User Group, drew the webinar to a close by emphasising three important lessons, gained during his 30 years’ experience as a business consultant. Experimenting is the best way to get the data platform that is right for your business, Neil said.
Data Vault enables agile working so there is no need to over-perfect your project’s stated aim – or risk getting locked into a permanent solution that risks not being what the company requires further down the line. That is a big difference from the early days of constructing data warehouses which had to be very rigid in their design and operation.
With many organisations looking for a Goldilock’s solution – i.e. one that fits all sizes – it is easy to overlook the fact you can easily rework your platform without committing lots of financial resources using Data Vault 2.0.
Neil revealed how he struggled to understand whether the business vault sat in its own schema, or is it a separate database with its own structure and rules. Having worked out that it sits as part of the raw vault and by overlaying the latter with business rules – like another source system – you will ensure better results.
And when it comes to the preparing information marts, Neil reminded the audience that reporting also has patterns and recommends the use of PITs and Bridges.
The six speakers generated a number of interesting questions and comments from members of the user group. And judging by the feedback, the audience agreed it was the most educational and informative session since the meetings went online in early 2020.
Watch the whole meeting by clicking on the video at the top of the page.
Sign up for the UK Data Vault User Group newsletter to make sure you do not miss out on our activities.