How to Triple Your Model’s Inference Speed

Watch on Demand

Learn how to optimize your PyTorch model for inference using DeepSpeed Inference.

About this Webinar

Serving a large model in production with high reliability, concurrency, quality, and low inference time is essential for businesses to respond quickly to users and handle thousands, even millions, of daily requests.

Join Lightning’s Sebastian Raschka, Neil Bhatt and Thomas Chaton as they walk through the successful (and unsuccessful) experiments we ran to optimize our Stable Diffusion model, ultimately increasing inference speed by 3x.

Learn how to:

Learn from our experiments and take your model-serving capabilities to the next level
How effectively batching requests allows you to see improved performance
Which PyTorch optimizations were most effective and which were least effective

How to Triple Your Model’s Inference Speed

About this Webinar

Presented by

Neil Bhatt

Thomas Chaton

Sebastian Raschka