Invest Like the Best with Patrick O'Shaughnessy

Etched - Building AI Hardware to Make Inference Faster and Cheaper - [Invest Like the Best, EP.480]

1h 29min

Gavin Uberti and Rob Lockett, the founders of Etched, discuss their journey building a specialized AI chip for inference that outperforms incumbents through architectural bets like low-voltage operation and cluster-scale memory. They share how they overcame deep skepticism by vertically integrating, parallelizing development, and hiring industry legends alongside raw talent. The conversation explores why inference will become the world's largest market and how Etched's approach enables radically faster and cheaper AI inference at massive scale.

Summarized by Podsumo

🎧 Listen 🎙️ Ask about this episode

✨ Key Takeaways

1

Etched' chip runs at under half the voltage of other AI chips, dramatically improving power efficiency by leveraging the physics of voltage scaling to cram more flops into the same silicon area without thermal throttling.
2

The company built a cluster-scale memory interconnect that reduces chip-to-chip latency by more than 5x compared to NVIDIA's Blackwell, enabling effective scaling to thousands of chips for serving massive models.
3

Etched achieved a 40-day bring-up from silicon arrival to running inference in a rack by parallelizing every possible step—building the software stack, production line, and thermal solutions before the chip even arrived.
4

They use a bimodal talent strategy pairing 'legends' (like Brian Loiler, who built NVIDIA's DGX rack team) with young, gritty 'chips-on-shoulders' individuals who combine first-principles thinking with extreme drive.
5

The founders emphasize a contrarian, 'assume it's possible' philosophy, citing the clock-domain crossing problem where they aligned two signals to within 50 picoseconds after many said it was unsolvable.

💬 Notable Quotes

"There's a certain level of naivety required to think that you could build a chip better than every other AI chip ever built... It turns out that everybody's answers are extremely siloed to a set of constraints that aren't true anymore."
"It took a while to get to the primitives that we think really matter for scaling inference. We realized that fundamentally, if you want to run a majority of the tokens in the world, you need to do three things: build a chip with the most flops in a given power budget, build a chip that has the lowest latency between chips, and produce as much of it as possible."
"When you have machines that can think almost as good as the best humans, you're entering a unique moment. The cost of producing intelligence is dramatically cheaper than the value of that intelligence, so we are in a multi-year, probably multi-decade supply shortage of these tokens."