![rw-book-cover](https://substackcdn.com/image/fetch/w_1200,h_600,c_fill,f_jpg,q_auto:good,fl_progressive:steep,g_auto/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b571-bdbe-46cc-aa5c-8fd5e5555b01_720x430.png) ## Metadata - Author: [[jack-morris|Jack Morris]] - Full Title:: There Are No New Ideas in AI… Only New Datasets - Category:: #🗞️Articles - Document Tags:: [[Foundation models|Foundation Models]], - URL:: https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-only - Read date:: [[2025-07-08]] ## Highlights > If you squint just a little, these four things (DNNs → Transformer LMs → RLHF → Reasoning) summarize everything that’s happened in AI. ([View Highlight](https://read.readwise.io/read/01jzkbf0f7hnavtzbr9gw47sek)) > each of these four breakthroughs **enabled us to learn from a new data source:** > 1. AlexNet and its follow-ups unlocked [ImageNet](http://(https//www.image-net.org/), a large database of class-labeled images that drove fifteen years of progress in computer vision > 2. Transformers unlocked training on “The Internet” and a race to download, categorize, and parse all the text on [The Web](https://arxiv.org/abs/2101.00027) (which [it seems](https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications) [we’ve mostly done](https://arxiv.org/abs/2305.16264) [by now](https://arxiv.org/abs/2305.13230)) > 3. RLHF allowed us to learn from human labels indicating what “good text” is (mostly a vibes thing) > 4. Reasoning seems to let us learn from [“verifiers”](http://incompleteideas.net/IncIdeas/KeytoAI.html), things like calculators and compilers that can evaluate the outputs of language models ([View Highlight](https://read.readwise.io/read/01jzkbgpfmr1cb77hqges6nhn1)) > As one salient example, some researchers worked on [developing a new BERT-like model using an architecture other than transformers](https://arxiv.org/abs/2212.10544). They spent a year or so tweaking the architecture in hundreds of different ways, and managed to produce a different type of model (this is a state-space model or “SSM”) that performed about equivalently to the original transformer when trained on the same data. > This discovered equivalence is really profound because it hints that **there is an upper bound to what we might learn from a given dataset**. All the training tricks and model upgrades in the world won’t get around the cold hard fact that there is only so much you can learn from a given dataset. ([View Highlight](https://read.readwise.io/read/01jzkbk0pz53r41jn4rjjw58mq)) > And maybe this apathy to new ideas is what we were supposed to take away from [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html). ([View Highlight](https://read.readwise.io/read/01jzkbk93h8bjq9ay0nqv1yvcj))