{"id":5647070,"date":"2022-12-20T17:44:22","date_gmt":"2022-12-20T22:44:22","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5647070"},"modified":"2023-03-07T17:13:02","modified_gmt":"2023-03-07T22:13:02","slug":"dynamic-batching-autoscaling","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/","title":{"rendered":"Scale Model Serving with Dynamic Batching and Autoscaling"},"content":{"rendered":"<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Key takeaways<\/h3> How to scale model serving in production with dynamic batching and autoscaling. <\/div>\n<p style=\"text-align: right;\"><a target=\"blank\" href=\"https:\/\/lightning.ai\/component\/ZJ9fgJI226-Autoscaler\" class=\"d-inline-block btn btn-\">Get started with the Lightning Autoscaler<\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Serving large models in production with high concurrency and throughput is essential for businesses to respond quickly to users and be available to handle a large number of requests. Below, we share how we took advantage of Dynamic Batching and <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/github.com\/Lightning-AI\/stable-diffusion-deploy\/blob\/main\/app.py#L200\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-1500893511\">Autoscaling<\/span><\/a> to serve Stable Diffusion in production and scaled it to handle over 1000 concurrent users.<\/p>\n<div id=\"attachment_5647075\" style=\"width: 2410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647075\" class=\"size-full wp-image-5647075\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-batch-size.png\" alt=\"\" width=\"2400\" height=\"1600\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-batch-size.png 2400w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-batch-size-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-batch-size-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-batch-size-1536x1024.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-batch-size-2048x1365.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-batch-size-300x200@2x.png 600w\" sizes=\"(max-width: 2400px) 100vw, 2400px\" \/><p id=\"caption-attachment-5647075\" class=\"wp-caption-text\">Throughput comparison with batch sizes 1 and 12.<\/p><\/div>\n<p>&nbsp;<\/p>\n<h2>Dynamic Batching<\/h2>\n<p>Batch processing increases the throughput or number of requests processed in a certain timeframe with some added latency. Dynamic batching is a method in which we aggregate requests and batch them for parallel processing. In our recent deployment of Stable Diffusion, we implemented a dynamic batching system that improved throughput by 50%. Before batching, we were able to handle ~30 concurrent users, and after enabling batching with a maximum batch size of 12, we were able to handle 60 concurrent users per process. The batch size can be finetuned based on the capability of your hardware.<\/p>\n<p>To build a dynamic batching system, we took a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Publish%E2%80%93subscribe_pattern\">consumer-publisher<\/a> approach in which we continuously add incoming requests to a queue, and the queue items are consumed for model prediction in a background <span class=\"mui_tooltip wrapped\"><span class=\"tooltip_wrap\">process<img decoding=\"async\" class=\"ml-1\" width=\"12.5\" height=\"12.5\" alt=\"tooltip icon\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/themes\/lightning-wp\/assets\/images\/tooltip.svg\"><span class=\"tooltip_content\">This is technically a Python coroutine, but we&#8217;ll use the term &#8216;process&#8217; for simplicity.<\/span><\/span><\/span>. Once the prediction result is ready, the producer publishes the prediction to a <code>result dictionary<\/code> and finally returns the generated image to the client.<\/p>\n<div id=\"attachment_5647076\" style=\"width: 2410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647076\" class=\"size-full wp-image-5647076\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests.png\" alt=\"\" width=\"2400\" height=\"1600\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests.png 2400w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1536x1024.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-2048x1365.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-300x200@2x.png 600w\" sizes=\"(max-width: 2400px) 100vw, 2400px\" \/><p id=\"caption-attachment-5647076\" class=\"wp-caption-text\">Fig. Flow of requests through the queue for batch aggregation<\/p><\/div>\n<p>&nbsp;<\/p>\n<h2>Autoscaling<\/h2>\n<p>With dynamic batching, concurrency is limited to the maximum batch size a model server can handle without running out of memory on the GPU.<\/p>\n<p>We leveraged multiple GPUs by running the model server on each of the GPUs in parallel. We implemented an autoscaling feature that automatically increases the number of model servers with high traffic and downscales when traffic is idle. We are also able to configure a minimum number of servers that must be running at any given time.<\/p>\n<p>To implement Autoscaling, we run a job every 30 seconds to check the traffic (defined as the number of current requests in the queue). Based on this traffic, we either upscale or downscale the model servers.<\/p>\n<div id=\"attachment_5647077\" style=\"width: 2410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647077\" class=\"size-full wp-image-5647077\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1.png\" alt=\"\" width=\"2400\" height=\"1600\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1.png 2400w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1-1536x1024.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1-2048x1365.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Flow-of-requests-1-300x200@2x.png 600w\" sizes=\"(max-width: 2400px) 100vw, 2400px\" \/><p id=\"caption-attachment-5647077\" class=\"wp-caption-text\">Fig. Autoscaling loops run after an interval to adjust the parallel servers.<\/p><\/div>\n<p>Just by increasing the number of servers to 4, we <span class=\"notion-enable-hover\" data-token-index=\"1\" data-reactroot=\"\">scaled concurrency from 60 users to 400<\/span>. This can be further upscaled to handle additional traffic.<\/p>\n<div id=\"attachment_5647078\" style=\"width: 2410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647078\" class=\"size-full wp-image-5647078\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-servers.png\" alt=\"\" width=\"2400\" height=\"1600\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-servers.png 2400w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-servers-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-servers-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-servers-1536x1024.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-servers-2048x1365.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Users-vs.-servers-300x200@2x.png 600w\" sizes=\"(max-width: 2400px) 100vw, 2400px\" \/><p id=\"caption-attachment-5647078\" class=\"wp-caption-text\">Fig. Overall throughput comparison with parallel servers at 1 and 4<\/p><\/div>\n<p>&nbsp;<\/p>\n<h2>Performance Testing<\/h2>\n<p>We ran the tests using Locust, an open-source performance testing library for Python, which we deployed as a component in our Lightning App. (Remember, a Lightning App is just organized Python code you can run anywhere, so adding this kind of third-party integration is simple!)<\/p>\n<div id=\"attachment_5647079\" style=\"width: 1255px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647079\" class=\"size-full wp-image-5647079\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Performance-testing-chart.png\" alt=\"\" width=\"1245\" height=\"1110\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Performance-testing-chart.png 1245w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Performance-testing-chart-300x267.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Performance-testing-chart-1024x913.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Performance-testing-chart-300x267@2x.png 600w\" sizes=\"(max-width: 1245px) 100vw, 1245px\" \/><p id=\"caption-attachment-5647079\" class=\"wp-caption-text\">Fig. Performance testing chart<\/p><\/div>\n<p>With the configuration of a maximum of 4 model servers and dynamic batching of size 12, we were able to increase the number of concurrent users this deployment could handle more than tenfold. With these techniques, we can easily increase the server to handle thousands of loads.<\/p>\n<p>If you want to deploy your own Stable Diffusion server with a custom batch size and number of workers, you can clone the app from the <a href=\"https:\/\/lightning.ai\/app\/HvUwbEG90E-Muse\">App Gallery<\/a> and run this command from your terminal:<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python \"><span class=\"pl-s1\">lightning<\/span> <span class=\"pl-s1\">run<\/span> <span class=\"pl-s1\">app<\/span> <span class=\"pl-s1\">app<\/span>.<span class=\"pl-s1\">py<\/span> <span class=\"pl-c1\">-<\/span><span class=\"pl-c1\">-<\/span><span class=\"pl-s1\">env<\/span> <span class=\"pl-v\">MUSE_MIN_WORKERS<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-c1\">2<\/span> <span class=\"pl-c1\">-<\/span><span class=\"pl-c1\">-<\/span><span class=\"pl-s1\">env<\/span> <span class=\"pl-v\">MUSE_GPU_TYPE<\/span><span class=\"pl-c1\">=<\/span><span class=\"pl-s1\">gpu<\/span><span class=\"pl-c1\">-<\/span><span class=\"pl-s1\">fast<\/span><\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<p style=\"text-align: center;\"><a target=\"blank\" href=\"https:\/\/lightning.ai\/component\/ZJ9fgJI226-Autoscaler\" class=\"d-inline-block btn btn-\">Add autoscaling to your ML application with our open-source component!<\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Have questions about concurrency, latency, or other related topics? You can ask your questions in our community <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/join.slack.com\/t\/pytorch-lightning\/shared_invite\/zt-1dm4phlc0-84Jv9_8Mp_tWraICOJ467Q\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"1\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-1230640743\">Slack<\/span><\/a> or <a class=\"notion-link-token notion-enable-hover\" href=\"https:\/\/forums.pytorchlightning.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\" data-token-index=\"3\" data-reactroot=\"\"><span class=\"link-annotation-unknown-block-id-589320189\">Forum<\/span><\/a>. <span role=\"img\" aria-label=\"\ud83d\udc9c\">\ud83d\udc9c<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Serving large models in production with high concurrency and throughput is essential for businesses to respond quickly to users and be available to handle a large number of requests. Below, we share how we took advantage of Dynamic Batching and Autoscaling to serve Stable Diffusion in production and scaled it to handle over 1000<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\" title=\"ReadScale Model Serving with Dynamic Batching and Autoscaling\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":5647081,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[41],"tags":[96,132,131,97,133],"glossary":[],"acf":{"additional_authors":false,"hide_from_archive":false,"content_type":"Blog Post","custom_styles":""},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Scale Model Serving with Dynamic Batching and Autoscaling<\/title>\n<meta name=\"description\" content=\"Learn how to scale your model serving in production by using dynamic batching and autoscaling, ensuring high concurrency and throughput.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Scale Model Serving with Dynamic Batching and Autoscaling\" \/>\n<meta property=\"og:description\" content=\"Learn how to scale your model serving in production by using dynamic batching and autoscaling, ensuring high concurrency and throughput.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2022-12-20T22:44:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-03-07T22:13:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2175\" \/>\n\t<meta property=\"og:image:height\" content=\"1125\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"Scale Model Serving with Dynamic Batching and Autoscaling\",\"datePublished\":\"2022-12-20T22:44:22+00:00\",\"dateModified\":\"2023-03-07T22:13:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\"},\"wordCount\":683,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png\",\"keywords\":[\"ai\",\"autoscaling\",\"dynamic batching\",\"ml\",\"serving\"],\"articleSection\":[\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\",\"name\":\"Scale Model Serving with Dynamic Batching and Autoscaling\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png\",\"datePublished\":\"2022-12-20T22:44:22+00:00\",\"dateModified\":\"2023-03-07T22:13:02+00:00\",\"description\":\"Learn how to scale your model serving in production by using dynamic batching and autoscaling, ensuring high concurrency and throughput.\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png\",\"width\":2175,\"height\":1125},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Scale Model Serving with Dynamic Batching and Autoscaling\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Scale Model Serving with Dynamic Batching and Autoscaling","description":"Learn how to scale your model serving in production by using dynamic batching and autoscaling, ensuring high concurrency and throughput.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/","og_locale":"en_US","og_type":"article","og_title":"Scale Model Serving with Dynamic Batching and Autoscaling","og_description":"Learn how to scale your model serving in production by using dynamic batching and autoscaling, ensuring high concurrency and throughput.","og_url":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/","og_site_name":"Lightning AI","article_published_time":"2022-12-20T22:44:22+00:00","article_modified_time":"2023-03-07T22:13:02+00:00","og_image":[{"width":2175,"height":1125,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"Scale Model Serving with Dynamic Batching and Autoscaling","datePublished":"2022-12-20T22:44:22+00:00","dateModified":"2023-03-07T22:13:02+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/"},"wordCount":683,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png","keywords":["ai","autoscaling","dynamic batching","ml","serving"],"articleSection":["Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/","url":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/","name":"Scale Model Serving with Dynamic Batching and Autoscaling","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png","datePublished":"2022-12-20T22:44:22+00:00","dateModified":"2023-03-07T22:13:02+00:00","description":"Learn how to scale your model serving in production by using dynamic batching and autoscaling, ensuring high concurrency and throughput.","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/12\/Autoscaling-featured.png","width":2175,"height":1125},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Scale Model Serving with Dynamic Batching and Autoscaling"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647070"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5647070"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647070\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5647081"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5647070"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5647070"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5647070"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5647070"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}