My initial reaction to data contracts was the same as my reaction to the data mesh. Both struck me as a kind of Rorschach proposition: Something defined well enough that we can all sense its shape, but abstract enough that we can also project our own opinions on top of it. Shapeshifting ideas like these are magnets for debate—it’s easy to say what you think a cloud looks like—but impossible to pin down. The moment we agree on what one corner of it should be, the rest of it melts into something new.
Once they do, those expectations are codified. The exact mechanism for this seems to vary, though most arrangements involve sticking some service in between the data source and the database that checks if incoming data meets the agreed-upon standard.
A data contract adds an expectation to these jobs by specifying what the result should look like. This not only makes the system more durable, but it also makes declarative DAGs possible.
data teams can’t expect to stop changes to products or Salesforce; we can only hope to contain them.
We should be told when something changes, but it’s a notification, not a negotiation.
data contracts shouldn’t introduce unnecessary and impractical negotiations to extract promises from data providers that they can’t and shouldn’t keep. They should instead be a simple defense—built by data teams, for data teams—against communication failures.
Anything that writes to the database writes to a staging schema. We define data contracts—or, dbt tests, as is often already done—against those tables. When the table updates, the test runs. If it passes, the table moves to the destination schema that we write to today.