rw-book-cover

Metadata

Highlights

Existing literature on proxy metrics concentrates mainly on the estimation of the long-term impact from short-term experimental data. (View Highlight)

Existing literature on proxy metrics [9, 1] has focused more on predicting the long-term effect, (View Highlight)

there is a trade-off between sensitivity and directionality: the more we increase sensitivity, the less likely our metric will be related to the north star. (View Highlight)

New highlights added 2025-09-14

North star metrics and online experimentation play a central role in how technology companies improve their products. In many practical settings, however, evaluating experiments based on the north star metric directly can be difficult. The two most significant issues are 1) low sensitivity of the north star metric and 2) differences between the short-term and long-term impact on the north star metric. A common solution is to rely on proxy metrics rather than the north star in experiment evaluation and launch decisions. Existing literature on proxy metrics concentrates mainly on the estimation of the long-term impact from short-term experimental data. In this paper, instead, we focus on the trade-off between the estimation of the long-term impact and the sensitivity in the short term. (View Highlight)

we have established two key properties for a metric: sensitivity and directionality. Empirically, we observe an inverse relationship between these two properties (View Highlight)

Choose proxies with common sense. The best auxiliary metrics in our proxy metric captured intuitive, critical aspects of the specific user journey targeted by that class of experiments. For example, whether a user had a satisfactory watch from the homepage is a good auxiliary metric for experiments changing the recommendations on the home feed. In fact, many of the best auxiliary metrics were already informally used by engineers, suggesting that common sense metrics have superior statistical properties. (View Highlight)