* Tags:: #📝CuratedNotes , [[Data engineering|Data engineering]] Data modeling are different techniques to structure information so that answering most business questions is easy, without the need to predict in advance which type of questions are going to be asked, and without needing to make ad-hoc transformations every time we want to answer a new question. My own explanation of dimensional modeling on a company: [[Fight the entropy|Fight The Entropy]]. This is also very related to the [[pagesmetrics-layer|Pages/metrics Layer]]. ## Styles More or less in order of popularity/exotism/applied nowadays: - [[Kimball dimensional modeling|Kimball Dimensional Modeling]] - [[Data vault|Data Vault]] - [[OBT - One big table|Obt One Big Table]] - [[New in activity schema 2.0|New In Activity Schema 2]] - [[Introducing entity centric data modeling for analytics|Introducing Entity Centric Data Modeling For Analytics]] by [[Maxime beauchemin|Maxime Beauchemin]] - [[Bill inmon|Bill Inmon]] in [[building-the-data-warehouse|Building The Data Warehouse]] book (kind of a 3NF) - [Functional Kimball](https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a) by [[Maxime beauchemin|Maxime Beauchemin]] ## Preferred way The best advice comes from this critique to [[Kimball dimensional modeling|Kimball Dimensional Modeling]]: [Kimball in the context of the modern data warehouse: what's worth keeping, and what's not - YouTube](https://www.youtube.com/watch?v=3OcS2TMXELU), and another on OBT: ![[OBT - One big table#^586b16]] You start easy with OBT. On another hand, you may use [[Kimball dimensional modeling|Kimball Dimensional Modeling]] as an intermediate step to OBT. This is also shared by other modern articles such as: [Star Schema vs. OBT for Data Warehouse Performance | Blog | Fivetran](https://fivetran.com/blog/star-schema-vs-obt) >staging your ELT process such that the data all get transformed into something like a star schema before everything gets re-joined back" ## [[Slowly changing dimensions|Slowly Changing Dimensions]] type 2 For SCD2 in the modern data stack, there are two alternatives: do them directly with Airbyte, or doing them with dbt snapshots. Joining the fact and the SCD2 table is easy by specifying conditions on the date as part of the join. ## Data modeling in [[Lakehouse. Convergence of data lake and data warehouse|Lakehouse]] [[medallion-architecture|Medallion Architecture]] + [[Data vault|Data Vault]] + [[Kimball dimensional modeling|Kimball Dimensional Modeling]]: [Data Warehousing Modeling Techniques and Their Implementation on the Databricks Lakehouse Platform](https://www.databricks.com/blog/2022/06/24/data-warehousing-modeling-techniques-and-their-implementation-on-the-databricks-lakehouse-platform.html) ## Metadata propagation [gouline/dbt-metabase: Model synchronization from dbt to Metabase.](https://github.com/gouline/dbt-metabase) seems to be a very interesting package, providing two-way sync between [[DBT|Dbt]] and [[metabase|Metabase]] (metadata is propagated to Metabase, and dashboards appear in dbt lineage as [Exposures | dbt Docs](https://docs.getdbt.com/docs/building-a-dbt-project/exposures)). ## Other refs A very nice example to explain why modeling to other people: [Data Modeling Layer & Concepts | The Analytics Setup Guidebook (holistics.io)](https://www.holistics.io/books/setup-analytics/data-modeling-layer-and-concepts/) The truth hierachy, the burden of the proof on the new model [[Productizing analytics. a retrospective|Productizing Analytics]] > We also enforced a dashboard “truth hierarchy”, that is, if in the process of exploring data in Looker, a stakeholder developed a metric that “disagreed” with a metric already established in the canonical dashboards, the burden of proof was on them to understand the misalignment. The data team was available to help with quick investigations through a company Slack channel, but the goal was to teach stakeholders how to fish and to avoid having to spend all of their time chasing down numbers that didn’t quite agree