Metadata
- Author: timkellogg.me
- Full Title:: Explainer: What’s R1 & Everything Else?
- Category:: 🗞️Articles
- Document Tags:: Foundational models,
- URL:: https://timkellogg.me/blog/2025/01/25/r1
- Read date:: 2025-02-02
Highlights
the path forward is simple, basic RL. (View Highlight)
Inference Time Scaling Laws This is about reasoning models, like o1 & R1. The longer they think, the better they perform. It wasn’t, however, clear how exactly one should perform more computation in order to achieve better results. The naive assumption was that Chain of Thought (CoT) could work; you just train the model to do CoT. (View Highlight)
It turns out CoT is best. R1 is just doing simple, single-line chain of thought trained by RL (maybe entropix was on to something?). Safe to assume o1 is doing the same. (View Highlight)