NVIDIA AI Podcast
NVIDIA AI Podcast

Snap’s Secret to Processing 10 Petabytes a Day: GPU-Accelerated Spark | NVIDIA AI Podcast Ep. 298

24 min

Snap's head of engineering platforms, Prudviya Bhattala, explains how the company migrated its massive 10 petabyte/day data pipeline from CPU-based Apache Spark to GPU-accelerated Spark using NVIDIA Spark Rapids and Google Cloud's GKE. The transition leveraged idle GPUs from Snap's online inference capacity, resulting in a 76% cost reduction, 62% fewer cores, and an 80% drop in memory footprint without any code changes to existing jobs.

Summarized by Podsumo

Key Takeaways

💬 Notable Quotes

Get every episode summarized
Delivered to Telegram. Ask questions about any episode.
Start on Telegram