13 Comments
User's avatar
John's avatar

Although Databricks community driven, is it possible to import this library natively in Microsoft Fabric notebooks or other notebooks?

Daniel Beach's avatar

If you can pip install a Python package, you can use it

Otrera's avatar

No you can’t. Read the License file in the repo. You can use it with Databricks only

Sébastien Hoarau's avatar

Superbly written, as always. Can 100% relate on the first part 🤣

John's avatar

Great article 👏

Herminio Vazquez's avatar

Have you tried cuallee?

Near zero dependency and data frame agnostic.

Otrera's avatar

Looking at it now, looks very basic, lack of checks, no filters, no reporting, no quarantine table..

bombercorny's avatar

I only see single column checks. Is it possible to check if combination of col 1 col2 is unique?

Junaid Effendi's avatar

Thanks for the article, whats different from Soda or GE?

Alexis's avatar

One thing i have noticed is that you can prohibit bad data being ingested in to your storage and store them in a separate loacation.

Correct me if I'm wrong, Soda and GE quality checks data once the are ingested in your storage.

Junaid Effendi's avatar

I see, you mean the quarantine tables.

Yeah I use GE but we have written custom on top of it to handle that case. Seems this tools already provides that.

Alexis's avatar

Yes correct. You will get one df that passes the dq checks and then another df for quarantine.