A Gentle Introduction to Data Quality.
Data Quality is a hot topic. From zero to hero in no time.
Data Quality came out of nowhere this year, you would think after decades of building Data Warehouses, Data Lakes, and Data Platforms that the data community would have committed itself to Data Quality (DQ) about 10 years ago.
But alas, a prophet is never accepted in his homeland. DQ has always been an afterthought at 99% of data organizations, and honestly, it still is. The difference is that it's being recognized and talked about in the open.
Will Data Quality tools be a hot topic in the near future? I have my doubts. The tooling lags far behind even now, and choices are limited.
I'm going to save your bacon anyways and give you a crash course on DQ, making an expert out of you, without you so much as setting a finger to the keyboard. By the end of this article, you should have a good idea about what is DQ and your options for implementation. We will cover …
What is Data Quality (DQ)?
The DQ tooling landscape.
Homegrown DQ for your team.
Closing thoughts.
Let's get to it, you rabble of data scoundrels.
What is Data Quality (DQ)?
I don’t think this is a particularly hard question to answer, but I think the devil is in the details of the answer. Of course, DQ is all about our data, what is it exactly about our data sets that need “quality” applied to them?
“I’m going to postulate that Data Quality can be boiled down into two main categories, firstly data specific attributes, and secondly, business-related attributes.”
All Data Quality related discussions and topics can be tossed into one of these two proverbial barrels.
Data specific attrbiutes.
Business-related attributes.
Keep reading with a 7-day free trial
Subscribe to Data Engineering Central to keep reading this post and get 7 days of free access to the full post archives.