Jump start your data warehouse

Rhys Hanscombe
Dec 3, 2019
2 min read

In December 2019, Alex Higgs delivered a practical webinar on how to combine dbt, Data Vault 2.0, and Snowflake to build modern, scalable, and maintainable data warehouses. Here’s a summary of the key insights and actionable strategies from that session.

The Modern Data Warehousing Landscape

Today’s data warehousing is defined by:

Cloud-based infrastructure
Agile development and ETL automation
Continuous integration and shorter development cycles
Strong data governance
Self-service business intelligence and advanced dashboarding
AI-driven analytics

These trends demand tools and frameworks that are flexible, scalable, and easy to automate.

What Is dbt?

dbt (data build tool) is a free, open-source command-line tool built in Python. It’s designed for data analysts and engineers to transform raw data into actionable insights. dbt is the “T” in ELT, enabling you to:

Write modular, reusable SQL with templates and macros
Standardize and automate data transformations
Reduce errors and manual coding
Leverage multi-threaded and incremental processing
Maintain documentation and test data quality
Integrate with popular databases like Snowflake, Redshift, BigQuery, and more

dbt Cloud offers enterprise features such as a web-based IDE, scheduling, version control, and role-based security.

Introducing AutomateDV: Easy-Entry Data Vault Automation

AutomateDV is a dbt package that automates the creation of Data Vault 2.0 structures. Key features include:

Metadata-driven automation: Provide metadata, not SQL
Standardized templates for hubs, links, satellites, and transactional links
Macros to simplify complex SQL
Fully tested with documentation and hands-on examples
Designed for easy adoption and proof-of-concept projects

AutomateDV helps you quickly implement Data Vault 2.0 best practices, saving time and reducing complexity.

How Does AutomateDV Fit into the Data Pipeline?

AutomateDV sits at the heart of the modern data pipeline:

Source systems feed data into persistent staging layers.
Data is loaded and transformed using dbt and dbtvault macros.
Raw and business Data Vault layers are built in Snowflake.
Analytics and dashboards are powered by clean, trusted data marts.

This architecture enables rapid, repeatable, and auditable data warehouse development.

Real-World Example: Snowflake TPC-H with Data Vault 2.0

The webinar showcased a worked example using Snowflake’s TPC-H dataset:

Profiling source data and identifying relationships
Simulating transaction feeds
Building 25 tables: 7 hubs, 7 links, 1 transactional link, and 10 satellites

This demonstrates how dbtvault and Snowflake can handle complex, real-world data models with ease.

Development Insights and What’s Next

Metadata ingestion and balancing usability with maintainability are key challenges.
AutomateDV is evolving, with plans for more Data Vault 2.0 structures (effectivity satellites, PIT tables, bridge tables, reference tables, and more).
Internal tools and a planned web app will further automate and simplify Data Vault development.

Summary: The Power of dbt, Data Vault, and Snowflake

dbt streamlines and automates your data warehouse workflow with templated transformations and robust features.
Snowflake provides a scalable, cloud-native backbone for your data warehouse.
AutomateDV delivers metadata-driven templates for rapid Data Vault 2.0 implementation.

Together, these tools enable you to jump start your Data Vault 2.0 data warehouse and deliver trusted, scalable analytics faster than ever.