How Twine Can Efficiently Move Data From Data Vault to Data Mart

Hannah Dowse
May 23, 2024
2 min read

https://www.youtube.com/watch?v=mVy6Q8xGZm0

Do you really need a PIT table?

Whilst undertaking Business Intelligence functions within an organisation, one of the biggest stumbling blocks centres around agile working. Whilst using a Data Vault within your data warehouse setup supports agile working, more often than not Kimball-style data marts do not. So, is there a way around this problem? The good news is yes, and, not for the first time, the solution is mathematically based. Erik Bouvin, a Swedish Data Vault Modeller and Data Engineer working for Stockholm-based Top of Minds, discussed ways to overcome the problem at the latest Data Vault User Group online discussion (which can be viewed here). Erik explained how data can be moved between a Data Vault and a typical data mart, in an agile way. Using Twine (an efficient set-based algorithm) means this can be achieved without the need for data pipeline dependencies and bypassing point in time (PIT). More importantly, the process also avoids the need for complicated scripts. Erik stated that to grasp the Twine concept, he would assume prior experience of data modelling using Data Vault, as well as Kimball modelling. Previous knowledge of Twine would not be required. To explain, Erik held up the definition of Twine by its creator Lars Rönnbäck: “A Twine is an efficient set-based algorithm that can be applied when you have a table in which you have recorded a history of changes and some other table with related PIT, for which you want to know which historical rows were in effect at those different time points.“ Twine can be used where there is a table which has recorded a history of changes and another table containing related points in time. Meaning you can distinguish between which historical rows were in effect at those different points in time. Erik talked through an example – recording sales by time and customer identification, including a timeline of when the event happened – and how that is represented in a typical data mart. He discussed what happens when a change to the source system creates changes in the customer attributes that require the data to be reloaded. At that point, you are looking at the expense of having to compute these changes with a possible loss of performance. At this point, Twine comes into its own and Erik showed what the code looks like.

Erik gave an example of analysing seven billion transactions more efficiently using Twine. Also, using Twine produces a strict pattern and can be easily automated. Erik stated other benefits of Twine include:

Fact and scdl=1 dim are always in synch with your chosen timeline
No lookups to dimensions are needed when loading Facts
Minimal ELT pipeline dependencies
No manual or semi-automated “re-keying” scripts are needed.

The Twine process is agile all the way, saving time, money and valuable company resources, offering greater returns on investment, and efficiency savings in data analytics functions.