50 First Dates with Data Vault
- Hannah Dowse
- Jan 8, 2024
- 5 min read
A user’s story of Data Vault adoption
Having celebrated its fifth birthday in September, one of the great advantages the Data Vault User Group’s monthly presentations is that our members hear some very personal – and occasionally very raw – experiences of implementing a Data Vault. The Data Vault User Group is about shared experiences learning from practitioners with varying levels of both Data Vault, and data warehousing experience, can be invaluable in guiding and shaping your own understanding – whether that be migrating to the Cloud, dealing with legacy systems, or starting with a fresh canvas. Analytics engineer Connor Lough, who had previously worked for Nordstrom and Pitchbook, presented his journey with ‘Fifty First Dates with Data Vault’ which you can watch here. Starting literally from scratch, trying to figure out what his organisation needed to understanding its data, – and how to map it in the data model – to getting qualified as a Data Vault 2.0 practitioner, trying to upskill a small team, while all along trying to respond to the business users’ needs and managing his own learning on a steep learning curve, certainly gave plenty of food for thought…
Well eating on a first date is important… surely? Connor certainly explained his love story with Data Vault with few feelings spared. He walked the room through his project’s 14-month timeline – and was not afraid to admit he got things wrong. Connor’s company – View The Space (VTS) – works in real estate offering leasing and asset management services to landlords and property managers, as well as tenants and brokers. His inherited app-based system was based on productivity tool Asana. The mission was to produce an all-encompassing dashboard with full visibility across multiple levels of the project structure. At the outset some 14 months ago, VTS had three data teams – covering ingestion, transformation and data science (using machine learning). Connor admitted at first he just wanted some organisation – in fact “any organisation” – to make sense of the data which was buried inside Asana’s ‘custom_fields’ column. Having read a few blogs on Data Vault, he set about trying to revolutionise VTS’ data culture. The first problem was falling into the familiar trap of merely mapping the sources for the first iteration of his Data Vault. And as more experienced data professionals know only too well, the one thing Data Vault founder Dan Linstedt warns will guarantee failure, is doing just that. As Connor said: “I quickly realised that what I’d built was NOT A DATA VAULT, and that it was time to do some learning.” What he had produced was not temporal, and lacked metadata, and crucially, had NO BUSINESS KEYS. So in summary, the modelling was wrong, so was the architecture and the methodology. But he was still proud of what he had built.
How quickly time changes one’s perspective? The next step of his journey took him to Finland where Connor underwent CDVP 2.0 training, rapidly reading John Giles’ Elephant in the Fridge book, and joining the Data Vault Alliance forum – don’t forget the Data Vault User Group has a free forum where you can learn from others’ experiences. The next reality that dawned on Connor was that he did not know a lot about the industry he was working in, and for a company he had still only recently joined. Moreover, he did not have the authority to impose a lot of what was needed to make a Data Vault work. At this point, Connor started to use the AutomateDV tool (formerly dbtvault) for Data Vault automation reducing the need for coding. Connor was helped by finding a mentor within VTS, someone who knew most about the company’s backend. His weekly meetings led to him seeking out key business users and speaking to them individually to uncover what were the core objects, and to standardise ways of working. He realises now that conversations should have been with a group, allowing them to define those core objects via collaboration
– rather than him leading the witnesses with his chosen lines of questioning on a one-to-one basis.
By the time the annual WWDVC in Vermont came along in May 2023, Connor was fully signed up to Meg Rush’s thoughts about ‘Ditching the Geek Speak at the Door.’
The result of failing to make the most of the engagement process meant the second iteration of Connor’s output was still a source system Data Vault – because no matter how good the stakeholder discussions were, there was no consensus.
Connor admits he basically tried to “boil the ocean,” taking on too much at once.
Iteration two had removed the chance of any quick wins and delivering value to the business, which ultimately is what the users need and want.
At this point Connor realised not only did he need to start smaller, and slower, but he also needed to train other members of the team to help build the Data Vault.
But having identified two members for iteration 2.1, there was no budget for them to undergo CDVP 2.0 certification.
Training was split into three sections from an overview of the Data Vault, to local development walking through the folders, and finally the methodology and ways of working.
The key understanding from this training was that not everyone needs to understand everything about Data Vault.
But there was still one big elephant in the room for Connor to confront… the VTS app’s backend. While used by the business, they were PROBABLY NOT the business keys.
Connor compared the next set of iterations as something akin to the classic Benny Hill TV show sequence where a queue of incensed people chase the protagonist in ever-decreasing skills, on fast-frame.
At this point, our speaker turned to Patrick Cuba’s ‘Data Vault Guru’ book for his Nirvana – simply getting data out of the vault.
More subtle refinements continued as 2023 wore on, but one day more recently, Connor came to work to be confronted by the thing every leader – from the world of politics through to business – struggles with… events.
One day, the were 25 per cent fewer people in engineering after a company restructure – where did that leave hopes of a Data Vault panacea?
Deprioritised was the reality. Instead, VTS decided to scale back to a dimensional model – although a true Data Vault is still the long-term goal.
So as 2023 draws to a close, Connor’s conclusions can be summed up quite simply.
Consider getting outside help. There is plenty of help out there to kickstart a Data Vault project – but expert consultants can save a lot of time and headaches from those first 50 dates, and teach you plenty along the way
Real life experience is critical. The best books out there can teach you plenty, but there is no substitute for the real thing.
So Connor recommends building your own Data Vault in a personal git repository, using Kaggle data – or creating your own using a package like Faker. “The more real you treat it, the better,” added Connor.
Author: ANDREW GRIFFIN