![rw-book-cover](https://cdn.pixabay.com/photo/2022/07/18/11/12/statue-7329573_1280.jpg) ## Metadata - Author: [[timkellogg.me|Timkellogg]] - Full Title:: Explainer: What's R1 & Everything Else? - Category:: #🗞️Articles - Document Tags:: [[Foundation models|Foundation Models]], - URL:: https://timkellogg.me/blog/2025/01/25/r1 - Read date:: [[2025-02-02]] ## Highlights > the path forward is simple, basic RL. ([View Highlight](https://read.readwise.io/read/01jjkbjresbjd2bvtsvgg14eaw)) > Inference Time Scaling Laws > This is about **reasoning models**, like o1 & R1. [The longer they think, the better they perform.](https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute) > It wasn’t, however, clear how exactly one should perform *more computation* in order to achieve better results. The naive assumption was that [Chain of Thought (CoT)](https://www.promptingguide.ai/techniques/cot) could work; you just train the model to do CoT. ([View Highlight](https://read.readwise.io/read/01jjkbmc1wwe335a72am86ys26)) > It turns out **CoT is best**. R1 is just doing simple, single-line chain of thought trained by RL (maybe [entropix](https://timkellogg.me/blog/2024/10/10/entropix) was on to something?). Safe to assume o1 is doing the same. ([View Highlight](https://read.readwise.io/read/01jjkbmftty53n6xnmhckpw1p1))