Data Quality: Prevention is Better Than The Cure

Rhys Hanscombe
Mar 20, 2025
2 min read

Updated: May 22, 2025

https://www.youtube.com/watch?v=xqSmUvmb5y8

Data Quality: Prevention is Better Than the Cure

In the recent webinar hosted by the Data Vault User Group, Andrew Jones delved into the critical topic of data quality, emphasizing that prevention is indeed better than the cure. As data practitioners, we often encounter the high costs and poor outcomes associated with poor data quality. Andrew highlighted that many organizations struggle to improve data quality, often resorting to costly workarounds instead of addressing the root causes.

The Importance of Data Quality

Andrew began by underscoring the significance of data quality, citing a survey from dbt which revealed that 57% of data practitioners consider data quality a major obstacle in preparing data for analysis. This figure has increased from 41% the previous year, indicating a worsening problem. Business leaders also face challenges, with 64% believing data analytics can provide a competitive advantage, yet only one in five are using it to increase revenue. This gap between aspirations and reality is largely due to poor data quality. The Cost of Poor Data Quality Gartner estimates that poor data quality costs organizations an average of $12.9 million annually. Andrew stressed the importance of articulating these costs to secure resources and alignment for improving data quality at the source. He introduced the 1-10-100 rule, which suggests that the earlier data quality issues are managed, the cheaper it is. This principle, akin to healthcare and software engineering, applies to data quality as well.

Prevention Over Remediation

Andrew advocated for shifting focus from remediation to prevention. While remediation efforts like observability and alerting are better than nothing, they are still costly and often too late to fix issues at the source. Prevention, on the other hand, involves catching issues early, reducing the complexity and cost of ETL processes, and maintaining user trust in data. Practical Example: Upstream Breaking Schema Changes Andrew provided a practical example of upstream breaking schema changes, a common issue that led his team to adopt Data Contracts. He explained the importance of performing root cause analysis to understand and address the causes of data quality issues. Using a fishbone diagram, he illustrated how various factors contribute to upstream schema changes and how addressing these can prevent issues.

Data Contracts as a Solution

Data Contracts create an interface between data producers and consumers, setting expectations and facilitating the generation of quality data. Andrew described Data Contracts as the API for data, allowing for stable interfaces that evolve over time. He emphasized the importance of collaboration and communication between data teams and data producers to ensure data quality. Conclusion Andrew concluded by encouraging data teams to focus on prevention, perform root cause analysis, and implement Data Contracts to address data quality issues at the source. By doing so, organizations can reduce the costs associated with poor data quality and deliver greater value.