How To Scale Model Serving in Production
Thursday, January 12th, 2023 | 7 AM PT
Learn how you can scale model serving on Lightning AI with dynamic batching and autoscaling.
About this Webinar
Serving large models in production with high concurrency and throughput is essential for businesses to respond quickly to users and be available to handle a large number of requests.
Join Lightning’s Neil Bhatt and Sherin Thomas to learn about how we took advantage of Dynamic Batching and Autoscaling to serve Stable Diffusion in production and scaled it to handle over 1K concurrent users.
Learn how to:
- Improve throughput with dynamic batching
- Implement horizontal scaling that will dynamically scale up and down dependent on traffic, saving your money
- Adjust scaling parameters with minimal development experience required
Presented By
Neil Bhatt | Director of Product | Lightning
Sherin Thomas | Senior Software Engineer | Lightning