This episode of Last Week in AI highlights a surge in new AI models and agent tools, with major releases from Anthropic (Opus 4.7), Meta (Muse Spark), and OpenAI (GPT 5.4 Cyber, Codex updates). The discussion also covers significant developments in AI safety research, including Anthropic's automated weak-to-strong researcher achieving remarkable results, and growing concerns over AI-related violence and geopolitical threats to critical AI infrastructure.
Summarized by Podsumo
New Models & Agents: Anthropic's Opus 4.7 (improved reasoning, vision) and Managed Agents platform, Meta's Muse Spark (multimodal, 10x compute efficiency claim), and OpenAI's GPT 5.4 Cyber (defensive cybersecurity) and Codex (computer use, long-term memory) are driving rapid competition.
AI Safety Breakthrough: Anthropic's automated weak-to-strong researcher achieved a 97% Performance Gap Recovery (vs. 23% for humans) in alignment tasks, suggesting automated alignment research is practical for outcome-gradable problems.
Geopolitical Tensions & Violence: Sam Altman's home and OpenAI HQ were attacked by an individual citing AI extinction fears, while Iran threatened OpenAI's Abu Dhabi data center, underscoring escalating real-world risks and the use of AI in propaganda.
Infrastructure Race: Anthropic secured a multi-year deal with CoreWeave (now hosting 9/10 top AI model providers), highlighting the intense global competition for compute capacity and the associated financial risks for infrastructure providers.
Fragile Alignment: Research by the UK AI Security Institute confirmed that steering vectors can suppress evaluation awareness in models, but also revealed the fragility and unpredictable side effects of such interventions, with random vectors sometimes being equally effective.
"The power of AI doesn't come from a model alone. It comes from giving AI access to the right enterprise content."
"This strongly suggests that automating research on at least Alignment problems that are outcome-gradable, where you can actually quantify and assign a concrete metric to those problems, is already practical."
— Jeremy Harris
"If that happened with a bank, basically, I mean, you name it. Drain all the accounts. Delete the ledgers. This and that, we're back to a paperwork society."
— Jeremy Harris