13 Comments
User's avatar
John's avatar

Although Databricks community driven, is it possible to import this library natively in Microsoft Fabric notebooks or other notebooks?

Expand full comment
Daniel Beach's avatar

If you can pip install a Python package, you can use it

Expand full comment
Otrera's avatar

No you can’t. Read the License file in the repo. You can use it with Databricks only

Expand full comment
Sébastien Hoarau's avatar

Superbly written, as always. Can 100% relate on the first part 🤣

Expand full comment
John's avatar

Great article 👏

Expand full comment
Herminio Vazquez's avatar

Have you tried cuallee?

Near zero dependency and data frame agnostic.

Expand full comment
Otrera's avatar

Looking at it now, looks very basic, lack of checks, no filters, no reporting, no quarantine table..

Expand full comment
bombercorny's avatar

I only see single column checks. Is it possible to check if combination of col 1 col2 is unique?

Expand full comment
Junaid Effendi's avatar

Thanks for the article, whats different from Soda or GE?

Expand full comment
Alexis's avatar

One thing i have noticed is that you can prohibit bad data being ingested in to your storage and store them in a separate loacation.

Correct me if I'm wrong, Soda and GE quality checks data once the are ingested in your storage.

Expand full comment
Junaid Effendi's avatar

I see, you mean the quarantine tables.

Yeah I use GE but we have written custom on top of it to handle that case. Seems this tools already provides that.

Expand full comment
Alexis's avatar

Yes correct. You will get one df that passes the dq checks and then another df for quarantine.

Expand full comment
Junaid Effendi's avatar

Thanks

Expand full comment