Dwarkesh Podcast
Dwarkesh Podcast

The next big breakthrough will be AIs learning on the job

20 min

The podcast argues that the next major breakthrough in AI will come from enabling models to learn continuously on the job, moving beyond pre-training and reinforcement learning from verifiable rewards (RLVR). It critiques current approaches for their sample inefficiency and inability to learn from real-world, unstructured data, and explores techniques like on-policy self-distillation (OPSD) and 'dreaming' to enable continual learning from deployment. The episode highlights the tension between scaling current methods and the need for new architectures to achieve true general intelligence.

Summarized by Podsumo

Key Takeaways

💬 Notable Quotes

Get every episode summarized
Delivered to Telegram. Ask questions about any episode.
Start on Telegram