
## Metadata
- Author: [[mikkel-dengsøe-from-inside-data-by-mikkel-de-ngsøe|Mikkel Dengsøe From Inside Data By Mikkel De Ngsøe]]
- Full Title:: Using AI to Build a Robust Testing Framework
- Category:: #🗞️Articles
- Document Tags:: [[vibe-coding|Vibe Coding]],
- Read date:: [[2025-07-23]]
## Highlights
> To test our data models, we’ll provide some guidelines to the LLM, mostly based on [this guide for testing best practices](https://substack.com/redirect/134b6315-f486-4dd4-a322-b60c601e0a21?j=eyJ1IjoiNDRpMmEifQ.txKr3BEB06jM7pp-5wphmyXof7jFdPvpfRX5kIjhK8g). Here’s a summary of our testing principles that we input to Cursor and Claude to keep track of. ([View Highlight](https://read.readwise.io/read/01k0vr3ak8naxsz5qssgr6etdy))
> General Principles - Use warn vs. error severity levels to reflect actual impact. Errors should block deployments; warnings highlight issues to monitor. - Data assets tagged importance: P1 (or upstream of these) require more extensive testing. - Avoid testing business logic assumptions that aren't visible from the SQL—focus on what can be objectively verified. - Fewer, high-signal tests are better than too many noisy or brittle ones. - Always leave a short comment on why each test is in place (e.g., “ensures IDs are unique to avoid joins blowing up”). Layer-specific Guidance - Sources: Test thoroughly using standard dbt source tests (e.g., unique, not_null, accepted_values). Mark source tables with a table_stats: true flag in YAML to activate SYNQ anomaly monitoring. - Staging: Avoid redundant tests. Only test columns that are transformed, derived, or critical for downstream joins/filters. - Marts: This is where business rules start to appear. Add custom tests only where you’ve validated the logic directly via SQL (e.g., verifying status values with a SELECT DISTINCT). Use this to prevent assumptions baked into dashboards. Additional Tips - Before writing a test, query the data directly to understand realistic constraints (e.g., should this value ever be zero? How many distinct values exist?). - When in doubt, prioritise coverage on high-impact data products and the metrics they feed. ([View Highlight](https://read.readwise.io/read/01k0vr3kjk387f0b915qj1w578))