Ubers journey toward better data culture from first principles uber engineering blog

* Tags:: #🗞️Articles, [[Data quality|Data Quality]], [[Data culture|Data culture]] * Author:: [[krishna-puttaswamy|Krishna Puttaswamy]], and [[suresh-srinivas|Suresh Srinivas]] * Link:: [Uber's Journey Toward Better Data Culture From First Principles | Uber Engineering Blog](https://eng.uber.com/ubers-journey-toward-better-data-culture-from-first-principles/) * Source date:: [[2021-03-16]] * Finished date:: [[2021-04-24]] - Data Quality Checks from Uber: >Freshness: time delay between production of data and when the data is 99.9% complete in the destination system including a watermark for completeness (default set to 3 9s), as simply optimizing for freshness without considering completeness leads to poor quality decisions. > >Completeness: % of rows in the destination system compared to the # of rows in the source system. > >Duplication: % of rows that have duplicate primary or unique keys, defaulting to 0% duplicate in raw data tables, while allowing for a small % of duplication in modeled tables. > >Cross-data-center consistency: % of data loss when a copy of a dataset in the current datacenter is compared to the copy in another datacenter. > >Semantic checks: captures critical properties of fields in the data such as null/not-null, uniqueness, # of distinct values, and range of values.