Accelerate the mapping of your business taxonomy
- Andrew Griffin
- Oct 27, 2021
- 2 min read
Dirk Vermeiren explains how to use automation while avoiding the pitfalls of a source model or an enterprise model
Data warehouses have existed for more than 30 years, but today the use the automation that the Data Vault method facilitates is helping to dramatically improve the effectiveness and efficiency of data platform development. The problem for many businesses, especially those using multiple source systems – across different departments, divisions and even territories – is how to automate the data load process. Much of the complexity comes from the ELT/ELT (extract, load and transform) process. Data Vault lends itself to automation because it requires multiple objects – such as hubs – to be loaded using the same pattern and logic. The key to using automation – with the Data Vault 2.0 – is not falling into the trap of copying the source model. Vaultspeed’s chief technical officer Dirk Vermeiren offered some timely advice on dealing with some often contradictory axioms at this month’s Data Vault User Group. In his talk, ‘Acclerate the Mapping of your Business Taxonomy,’ he recommended use of a business taxonomy – a road map that provides the names and membership properties of each object within the business domain – to create the raw data vault, allowing for maximum automation and minimal manual intervention. Specific rules classify and categorise any object in the domain and must be complete, consistent and unambiguous. That helps achieve better data quality, and allows data assets to be managed through data governance. So, for example, the business will require an abstract model – comprising hubs and links – to understand the relationships between customers and product sales or services sold, when combined with a business taxonomy it can create a semi-integrated raw data vault layer. Following the source modelling will result in a “fake” data vault with too many objects in the raw data vault. That risks leaving the business unable to understand a “too complex” model, and limiting the scope of automated integration work, and ELT loading reliant on manual input.
So by following Dirk’s recommended way of working, integration is achieved across all the sources thanks to the master data from the business taxonomy grouping the business elements together. Not only is the conceptual model represented in the physical data model – enabling the business to understand the model – the integration will occur though the automation logic, saving staff time and the company money.


