* Tags:: #🗞️Articles, [[Data quality|Data Quality]], [[Data culture|Data culture]]
* Author:: [[krishna-puttaswamy|Krishna Puttaswamy]], and [[suresh-srinivas|Suresh Srinivas]]
* Link:: [Uber's Journey Toward Better Data Culture From First Principles | Uber Engineering Blog](https://eng.uber.com/ubers-journey-toward-better-data-culture-from-first-principles/)
* Source date:: [[2021-03-16]]
* Finished date:: [[2021-04-24]]
- Data Quality Checks from Uber:
>Freshness: time delay between production of data and when the data is 99.9% complete in the destination system including a watermark for completeness (default set to 3 9s), as simply optimizing for freshness without considering completeness leads to poor quality decisions.
>
>Completeness: % of rows in the destination system compared to the # of rows in the source system.
>
>Duplication: % of rows that have duplicate primary or unique keys, defaulting to 0% duplicate in raw data tables, while allowing for a small % of duplication in modeled tables.
>
>Cross-data-center consistency: % of data loss when a copy of a dataset in the current datacenter is compared to the copy in another datacenter.
>
>Semantic checks: captures critical properties of fields in the data such as null/not-null, uniqueness, # of distinct values, and range of values.