Explainer Whats R1 Everything Else

rw-book-cover

Metadata

Author: Timkellogg
Full Title:: Explainer: What’s R1 & Everything Else?
Category:: 🗞️Articles
Document Tags:: Foundation Models,
URL:: https://timkellogg.me/blog/2025/01/25/r1
Read date:: 2025-02-02

Highlights

the path forward is simple, basic RL. (View Highlight)

Inference Time Scaling Laws This is about reasoning models, like o1 & R1. The longer they think, the better they perform. It wasn’t, however, clear how exactly one should perform more computation in order to achieve better results. The naive assumption was that Chain of Thought (CoT) could work; you just train the model to do CoT. (View Highlight)

It turns out CoT is best. R1 is just doing simple, single-line chain of thought trained by RL (maybe entropix was on to something?). Safe to assume o1 is doing the same. (View Highlight)

Dr. Mario's 2nd 🧠

Explorer

Explainer Whats R1 Everything Else

Metadata

Highlights

Webmentions

❤️ Likes

🔄 Reposts

💬 Replies

🔗 Mentions

Graph View

Table of Contents