Latent Space: The AI Engineer Podcast

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik

1h 25min

Noetik is tackling the high failure rate of cancer drug trials by developing AI models that improve patient selection. They achieve this by generating massive, multimodal datasets from human tumor samples, training self-supervised foundation models to understand patient biology, and predicting drug response from standard H&E images. This approach aims to identify therapeutically relevant cancer subtypes and enable more effective clinical trial design.

Summarized by Podsumo

🎧 Listen 🎙️ Ask about this episode

✨ Key Takeaways

1

Contrarian Thesis: Cancer drug failures (90-95%) are primarily due to poor patient selection, not pharmacology, as patients respond heterogeneously.
2

Proprietary Multimodal Data: Noetik generates its own large-scale, high-quality datasets including H&E, protein stains, spatial transcriptomics, and genotyping from human tumors, rejecting reliance on traditional cell lines or public data.
3

AI for Patient Stratification: Their self-supervised foundation models learn complex patient biology to identify therapeutically relevant cancer subtypes, predicting drug response from simple H&E images for trial design and potential diagnostics.
4

First-of-its-Kind Licensing Deal: A $50 million deal with GSK to license Noetik's OctoVC model signifies a new era of foundation model licensing in biopharma, allowing pharma companies to leverage and fine-tune advanced AI capabilities.
5

Conviction in Data Scale: Success in AI for biology requires unprecedented data scale and intentional design, with Noetik spending years building data infrastructure before even training initial models.

💬 Notable Quotes

"Most of those drugs fail, we'd argue, because we're bad at selecting which patients those drugs are going to work in."

— Ron Alfa
"My hunch is that biology is pretty complex and that we still need to generate a lot more data."

— Daniel Bear
"It's like you can't do the AI R&D like or build the algorithms until you have good enough data set to tell you whether your favorite algorithmic idea is actually working or not."

— Daniel Bear