How To Scale Model Serving in Production

Thursday, January 12th, 2023 | 7 AM PT

Learn how you can scale model serving on Lightning AI with dynamic batching and autoscaling.

About this Webinar

Serving large models in production with high concurrency and throughput is essential for businesses to respond quickly to users and be available to handle a large number of requests.

Join Lightning’s Neil Bhatt and Sherin Thomas to learn about how we took advantage of Dynamic Batching and Autoscaling to serve Stable Diffusion in production and scaled it to handle over 1K concurrent users.

Learn how to:

Improve throughput with dynamic batching
Implement horizontal scaling that will dynamically scale up and down dependent on traffic, saving your money
Adjust scaling parameters with minimal development experience required

Presented By

Neil Bhatt | Director of Product | Lightning

Sherin Thomas | Senior Software Engineer | Lightning