- Tags:: #📝CuratedNotes , [[Data engineering|Data engineering]], [[Data analysis|Data analysis]]
Related: [[Metrics frameworks|Metrics Frameworks]]
![[Historia de la filosofĂa#^53961f]]
## First, what is "a metric"?
Aggregations over your facts or dimensions. From [The metrics layer has growing up to do - Amit’s Newsletter](https://prakasha.substack.com/p/the-metrics-layer-has-growing-up):
> 1. Simple aggregations
> 2. Aggregation with scalar functions (sum(Revenue) - sum(Cost))
> 3. Metrics that require joins (e.g., conversion rates with [[Slowly changing dimensions|Slowly Changing Dimensions]])
> 4. Metrics with window functions
> 5. Metrics with multiple aggregation levels (e.g., ratios in market share).
> 6. Multi-fact metrics (e.g., sales and purchases).
## What is the problem solved by the metrics layer?
Having a centralized definition of metrics outside of BI tools. We want to define metrics because as explained in [[The missing piece of the modern data stack|The Missing Piece Of The Modern Data Stack]]:
![[The missing piece of the modern data stack#^6a03e0]]
With a way to define metrics...
>we’re going to avoid two people defining sales as different numbers on a per-record level (does it include tax or not?), or using a different timezone to aggregate these numbers (are we using UTC, our head office time, or the local time of the store we sold things in?). ([What's an OLAP cube? 🎲 - Analytics Engineers Club](https://analyticsengineers.club/whats-an-olap-cube/))
There are two things to define: the base table, and the aggregation itself.
> In the world of BI, a metric is a succinct summarization of data to make it easily palatable to humans. Inherent to this are two concepts — the formula to be applied to summarize the data (metric formula definition) and the data to be summarized (metric data definition). In most BI tools, these concepts are conflated into one and exist as the combined “metric definition”, locked up inside the BI tool.
>
> We think these should be split apart.
>
> The complex SQL query that produces the rows needed by the metric should be defined separately from the metric definition (the SUM or COUNT or AVERAGE operation performed by the metric). This is a fundamental concept that allows us to centralize data production for a metric and manage metric data lineage in the data warehouse — very similar to how data transformation is handled by DBT ([The 7 traits of a modern metrics stack](https://blog.falkon.ai/the-7-traits-of-a-modern-metrics-stack-15403d488318?gif=true))
You may think it is enough with views but views fall short for several reasons. On the one hand, you may have an explosion of views considering the dimensions and grains you want to offer ([[Coalesce 2021. the metric system|Coalesce 2021]]). Second, at some point, you may want to materialize those views to have them precomputed ([[Rollup tables|Rollup Tables]]).
BI tools allow to define metrics. However, we would want to access such definitions from other places apart from the those tools. From [[The missing piece of the modern data stack|The Missing Piece Of The Modern Data Stack]]:
![[screenshot.png|400]]
## Implementations
This is the current proposal (as of [[2021-01-01]]) of [[DBT|Dbt]] for metrics:
* [How to Version Control Your Metrics to Create a Single Source of Truth for Business Metrics - YouTube](https://www.youtube.com/watch?v=zNeR3BXSOGw&t=946s)
However, since they seem to be moving towards more direct support of them:
* [[Feature] dbt should know about metrics · Issue #4071 · dbt-labs/dbt-core](https://github.com/dbt-labs/dbt-core/issues/4071)
* [Metrics | dbt Docs](https://docs.getdbt.com/docs/dbt-cloud/dbt-cloud-api/metadata/schema/metadata-schema-metrics)
* [dbt Metrics Framework Playbook](https://www.getdbt.com/metrics-playbook/#!/overview)
Additionally, there are several open source packages offering metrics on top of dbt. A good overview is in [this article](https://medium.com/@vfisa/an-overview-of-metric-layer-offerings-a9ddcffb446e). On my own I saw:
* [Metriql Docs | Metriql Docs](https://metriql.com/)
* [lightdash/lightdash: An open source alternative to Looker built using dbt. Made for analysts ❤️](https://github.com/lightdash/lightdash) (though this one will also act as a addition/replacement of Metabase)
[[metabase|Metabase]] also supports metrics ([07 Segments and Metrics (metabase.com)](https://www.metabase.com/docs/latest/administration-guide/07-segments-and-metrics.html)), but there is no way to link with [[DBT|Dbt]] yet, at least using [[dbt-metabase|Dbt Metabase]] ([Setting Metabase Metrics using dbt properties · Issue #25 · gouline/dbt-metabase (github.com)](https://github.com/gouline/dbt-metabase/issues/25))
Of course, big companies design their own systems, such as Minerva in the case of AirBnB: [How Airbnb Achieved Metric Consistency at Scale | by Robert Chang | The Airbnb Tech Blog | Medium](https://medium.com/airbnb-engineering/how-airbnb-achieved-metric-consistency-at-scale-f23cc53dea70)
## Other references
- You may want to make snapshots of your metrics: https://erikapullum.com/blog/snapshot-metrics
- Note that the metrics layer can also be called a metric store: [What is a metrics store? Why your data team should define business metrics in code - Transform Data](https://blog.transform.co/what-is-a-metrics-store-why-your-data-team-should-define-business-metrics-in-code/).
- An amazing explanation of [[Vicki boykis|Vicki Boykis]] on how... the truth does not really exists: [All numbers are made up, some are useful - by Vicki Boykis (substack.com)](https://vicki.substack.com/p/all-numbers-are-made-up-some-are).
- [Why cohort analysis beats all other approaches to calculating LTV (lifetimely.io)](https://www.lifetimely.io/blog-posts/why-cohort-analysis-beats-all-other-approaches-to-calculating-ltv) ^aa69e4
- [[Metric trees|Metric Trees]]