Metadata
- Author: January 17
- Full Title:: Part 3 – Data Modeling in Data Warehouses, Data Lakes, and Lake Houses.
- Category:: 🗞️Articles
- URL:: https://www.confessionsofadataguy.com/part-3-data-modeling-in-data-warehouses-data-lakes-and-lake-houses/
- Finished date:: 2023-03-30
Highlights
In the classic relational SQL database model, you focused on data deduplication, data normalization, and star schema to the extreme. Typically in the Data Lake and Lake House world, we would not model our file-based data sink in this manner. (View Highlight)
Data Lakes and Houses will contain Accumulators and Descriptors, not nessesiarly Facts and Dimensions. (View Highlight)
Accumulators Accumulators in a Data Lake aggregate or accumulate the transactional records, pretty much like a Fact table would have in the past. The main difference will be that the Accumulator table will probably contain fewer “keys” that point to other tables, that would most likely have been stripped out and put into a Dimension table in the classic SQL Kimball model. (View Highlight)
The Descriptors in the new Data Lake or Lake House are exactly like Dimensions, again they are just less broken up and normalized, there are just fewer of them. If the business requires say some distinct list of addresses, that would just be done an additional sub table probably run and filled much like a Data Mart or analytic further downstream. It wouldn’t be built into the Descriptor Tables and given keys that are referenced in an Accumulator table. (View Highlight)
big data sets cannot be queried without a good partitions strategy, hence they drive the data model (View Highlight)