8 Comments
Feb 5Liked by Daniel Beach

There was one comment out there on Reddit that suggested getting contracts implemented required playing an elaborate game of politics that blamed upstream teams for warehouse downtime in post-mortems. Good luck doing that most places without getting yourself thrown in the bin...

Expand full comment
Feb 5·edited Feb 9Liked by Daniel Beach

Awesome write-up. I agree the idea sounds good, and we, as data engineers, have been fighting with bad data for decades. We just called it schema change or evolution.

IMO, Data quality tools integrated into orchestrators are the way. Especially if the orchestrator is data asset-driven in a declarative way. Meaning you can create assertions on top of data assets (your dbt tables, your data marts), not on data pipelines. So every time a data asset gets updated, you are certain the "contract" (assertions) are true.

Expand full comment
Feb 5Liked by Daniel Beach

I suspect a great way would be a SaaS like Snowflake to create the concept of a data contract object (semantics, more than just columns and data types.. ideally there should be some ISO defining data contracts for most core business entities.

Expand full comment

i would like to disagree:

- accessing data always means using an interface or an api, without a data contract even without consent and certainty that the data and its quality will be preserved. an sql interface is still an api.

- data contracts provide certainty as to who in the organisation owns or manages the data or the data source. owners of data are very often not data engineers.

- quality tools should be integrated into data contracts.

- nobody is forcing you to use avro or protobuf.

- nobody is preventing you from using python and sql.

data contracts are a tool to make the use of data more robust between different teams. they provide a framework for discussion.

Expand full comment