
## Metadata
- Author: [[timkellogg.me|Timkellogg]]
- Full Title:: Explainer: What's R1 & Everything Else?
- Category:: #🗞️Articles
- Document Tags:: [[Foundation models|Foundation Models]],
- URL:: https://timkellogg.me/blog/2025/01/25/r1
- Read date:: [[2025-02-02]]
## Highlights
> the path forward is simple, basic RL. ([View Highlight](https://read.readwise.io/read/01jjkbjresbjd2bvtsvgg14eaw))
> Inference Time Scaling Laws
> This is about **reasoning models**, like o1 & R1. [The longer they think, the better they perform.](https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute)
> It wasn’t, however, clear how exactly one should perform *more computation* in order to achieve better results. The naive assumption was that [Chain of Thought (CoT)](https://www.promptingguide.ai/techniques/cot) could work; you just train the model to do CoT. ([View Highlight](https://read.readwise.io/read/01jjkbmc1wwe335a72am86ys26))
> It turns out **CoT is best**. R1 is just doing simple, single-line chain of thought trained by RL (maybe [entropix](https://timkellogg.me/blog/2024/10/10/entropix) was on to something?). Safe to assume o1 is doing the same. ([View Highlight](https://read.readwise.io/read/01jjkbmftty53n6xnmhckpw1p1))