11 Comments

Although Databricks community driven, is it possible to import this library natively in Microsoft Fabric notebooks or other notebooks?

Expand full comment

If you can pip install a Python package, you can use it

Expand full comment

Superbly written, as always. Can 100% relate on the first part 🤣

Expand full comment

Great article 👏

Expand full comment

Have you tried cuallee?

Near zero dependency and data frame agnostic.

Expand full comment

I only see single column checks. Is it possible to check if combination of col 1 col2 is unique?

Expand full comment

Thanks for the article, whats different from Soda or GE?

Expand full comment

One thing i have noticed is that you can prohibit bad data being ingested in to your storage and store them in a separate loacation.

Correct me if I'm wrong, Soda and GE quality checks data once the are ingested in your storage.

Expand full comment

I see, you mean the quarantine tables.

Yeah I use GE but we have written custom on top of it to handle that case. Seems this tools already provides that.

Expand full comment

Yes correct. You will get one df that passes the dq checks and then another df for quarantine.

Expand full comment

Thanks

Expand full comment