This episode explores the rise of open-weight AI models as competitive alternatives to closed-source systems, offering advantages in customization and data privacy. Benny Chen, co-founder of Fireworks AI, discusses his company's platform for serving and customizing these models at scale, highlighting their advanced inference infrastructure, multi-hardware support, and the transformative impact of reinforcement fine-tuning.
Summarized by Podsumo
Open-weight models are AI systems with publicly released train parameters, providing organizations direct control, customization, and data privacy, and are becoming increasingly competitive for production workloads.
Fireworks AI offers a platform for serving and customizing open-weight models at scale, featuring optimized inference infrastructure, multi-hardware support (Nvidia and AMD), custom kernels (Fire Attention), speculative decoding, and the 3D Fire Optimizer.
The company processes an impressive *13 trillion tokens a day*, demonstrating the significant adoption and scaling of open-source models.
Reinforcement Fine-Tuning (RFT) is presented as a crucial 'new lever' for model improvement, enabling efficient customization and making the evaluation process a valuable, enduring asset, supported by Fireworks AI's open-source `Eval Protocol`.
Benny Chen emphasizes the importance of building robust evaluation assets to 'define good' for AI ROI, allowing customers to confidently select and train models, rather than solely focusing on cost.
"Open source models are very strong, definitely it was a leap of faith from us to focus on open source models. At the same time, I'm like, you're pleasantly surprised on how strong the open source model has been getting to a point where it is like price competitive against close source models."
"At the end of the day, if you know how to evaluate your model, you have all the power. You get to decide which supplier to use in what setting and when and that power is very, very important to a lot of our customers."
"Reinforcement learning is a new lever that the industry found while the pre-training sort of like a free-writing slowdown."