How to Triple Your Model’s Inference Speed
Watch on Demand
Learn how to optimize your PyTorch model for inference using DeepSpeed Inference.
Watch on Demand
Learn how to optimize your PyTorch model for inference using DeepSpeed Inference.
Serving a large model in production with high reliability, concurrency, quality, and low inference time is essential for businesses to respond quickly to users and handle thousands, even millions, of daily requests.
Join Lightning’s Sebastian Raschka, Neil Bhatt and Thomas Chaton as they walk through the successful (and unsuccessful) experiments we ran to optimize our Stable Diffusion model, ultimately increasing inference speed by 3x.
Learn how to: