Mmmmm. This is one topic we hear a lot about, sure maybe not as much as Polars or DuckDB, but it’s like a continual dripping of water, the humdrum of “get better with your Data Quality,” never ends.
Yet, it’s safe to say that probably the 80/20 rule applies, at most 20% of Data Teams are practicing any sort of “real” Data quality. It’s something we all aspire to but never achieve.
One of the reasons that the Data Quality nut is so hard to crack is that it’s hard to eat an elephant. Where do you start? How do you convince leadership to spend money on some solution when the bottom line has now become our Lord, to be worshiped and cherished?
That’s what we are going to try to tackle today. A Primer on Data Quality. What is it, how do we do it, what tools can we use? Here is the outline of our Primer.
What is Data Quality?
Why is it hard to implement Data Quality?
What tools are available for Data Quality?
Home-Grown Data Quality.
Thanks to Delta for sponsoring this newsletter! I personally use Delta Lake on a daily basis, and I believe this technology represents the future of Data Engineering. Check out their website below.
Keep reading with a 7-day free trial
Subscribe to Data Engineering Central to keep reading this post and get 7 days of free access to the full post archives.