Introducing Lit-GPT: Hackable implementation of open-source large language models released under Apache 2.0 →

How to Triple Your Model’s Inference Speed

Watch on Demand

Learn how to optimize your PyTorch model for inference using DeepSpeed Inference.

About this Webinar

Serving a large model in production with high reliability, concurrency, quality, and low inference time is essential for businesses to respond quickly to users and handle thousands, even millions, of daily requests.

Join Lightning’s Sebastian Raschka, Neil Bhatt and Thomas Chaton as they walk through the successful (and unsuccessful) experiments we ran to optimize our Stable Diffusion model, ultimately increasing inference speed by 3x.

Learn how to:

  • Learn from our experiments and take your model-serving capabilities to the next level
  • How effectively batching requests allows you to see improved performance
  • Which PyTorch optimizations were most effective and which were least effective

Presented by

Neil Bhatt
Director of Product | Lightning
Thomas Chaton
Staff Research Engineer | Lightning
Sebastian Raschka
Lead AI Educator | Lightning