Alex Rives discusses ESMC (Evolutionary Scale Modeling Cambrian), a new generation of protein language models from BioHub that approaches protein biology from a world modeling perspective. The key insight is that scaling laws apply to proteins when trained on diverse metagenomic data, leading to emergent capabilities like structure prediction and antibody design without traditional inductive biases like multiple sequence alignments.
Summarized by Podsumo
ESMC is trained on 6.8 billion non-redundant protein sequences (including metagenomic data) and resolves predicted structures for 1.1 billion of them, creating the most comprehensive picture of protein structure and function to date.
The model achieves state-of-the-art structure prediction without MSA (multiple sequence alignment), outperforming prior methods especially for antibody design, where evolutionary information is less useful.
Using mechanistic interpretability (sparse autoencoders), the model reveals a hierarchy of biological features—from basic biochemical properties to large functional themes—that emerge purely from next-token prediction on protein sequences.
BioHub's Virtual Biology Initiative commits $400M internally and $100M externally to scale cellular data generation, aiming to create digital representations of cells that can generalize to predict novel interventions.
The team has demonstrated the ability to design protein binders and single-chain antibodies (scFvs) with therapeutic-level affinity by searching the world model.
"The thing that I was motivated by is, if you think about evolution and the data we have around proteins, those sequences contain patterns reflecting the constraints evolution is operating on. The idea behind ESM was to apply this principle across all of evolution, across the vast diversity of proteins."
"Alex Rives"
"I think one answer is compression and the idea that the model needs to develop underlying latent variables to solve the sequence prediction task. The model would start to have hidden variables representing the biology."
"Alex Rives"
"We're building a scientific institution for this new paradigm—powered by frontier experimental biology, frontier technology for measurement, and frontier artificial intelligence—all open source. Our mission is to cure or prevent disease."
"Alex Rives"