top of page

Data Vault and Machine Learning – does it fit together?

  • Writer: Rhys Hanscombe
    Rhys Hanscombe
  • Sep 5, 2023
  • 2 min read

At the 2023 Data Vault Conference, Torsten Glunde from Alligator Company tackled a question on many minds: can Data Vault and Machine Learning really work hand-in-hand? Here’s a friendly summary of the session’s insights—and a reminder to join our forum to keep the conversation going!


The Challenge: Bridging Data Engineering and Data Science

Most companies still produce data in an application-centric, IT-driven way—treating data as a by-product rather than a strategic asset. This leads to scattered, poorly managed data that’s hard to leverage for analytics or machine learning. Shouldn’t we all be more data-driven?


The Reality: Data Science and BI Are Still Siloed

  • People and platforms are separated: SQL pipelines for BI, Python pipelines for data science.

  • 60–80% of the work is data engineering and prep: Data scientists and analysts often repeat the same data wrangling, sometimes with different results.

  • The solution? Move toward a model-driven, unified platform where data assets are reusable and automation is the norm. SQL or Python? Why not both?


The Modern Data Stack: New Tech, Same Data Problems

With tools like Snowflake, Databricks, dbt, Airflow, and more, it’s easier than ever to build a modern data stack. But the real challenge is organizing and governing your data so it’s ready for analytics and machine learning. The right technology is only part of the answer—how you structure and manage your data is key.


Data as a Product: The Data Vault Backbone

  • Data Vault provides a model-driven approach: Business models (logical) and Data Vault models (physical) work together to create a stable, auditable, and extensible foundation.

  • Automation and separation of concerns: Data Vault enables automation, information hiding, and clear governance—making it easier to deliver high-quality, well-understood data for analytics and ML.

  • Data as a product: Data products should be easy to use, auditable, and benefit from all available data. Data Vault helps make this possible by providing a backbone for analytics and machine learning.


Data Vault + Machine Learning: The Takeaways

  • Shared platform: Data scientists and analysts should work from the same, well-governed data platform.

  • Model-driven automation: Use Data Vault automation rules to streamline data ingestion, transformation, and delivery for both BI and ML.

  • Quality matters: Analytics and ML benefit most from high-quality, well-understood data—something Data Vault is designed to deliver.

  • Decentralized organizations (Data Mesh): Logical data model centralization is still important for consistency and governance.

  • Source data quality: Remains a challenge—addressing it is essential for trustworthy analytics and ML.


Join the Data Community!

  • Sign up to our forum: Share your experiences, ask questions, and connect with fellow data enthusiasts.

  • Stay up to date: We host regular webinars, workshops, and meetups—don’t miss out!

  • Shape the future: Your feedback and participation help us build better tools and resources for everyone.


Final Thoughts

Data Vault and Machine Learning absolutely fit together—when you have the right foundation, governance, and community support. With a model-driven approach, automation, and a focus on data quality, you can unlock the full potential of your data for both analytics and AI.


Ready to bridge the gap between Data Vault and Machine Learning? Join the conversation, sign up for our next webinar, and let’s build the future of data together!

bottom of page