{"id":5647172,"date":"2023-01-24T16:33:26","date_gmt":"2023-01-24T21:33:26","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5647172"},"modified":"2023-07-28T11:27:39","modified_gmt":"2023-07-28T15:27:39","slug":"serve-stable-diffusion-three-times-faster","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/","title":{"rendered":"Serve Stable Diffusion Three Times Faster"},"content":{"rendered":"<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Learn how to:<\/h3> Optimize your PyTorch model for inference using DeepSpeed Inference.<\/p>\n<p><\/div>\n<p>Serving large models in production with high <span class=\"mui_tooltip wrapped\"><span class=\"tooltip_wrap\">concurrency<img decoding=\"async\" class=\"ml-1\" width=\"12.5\" height=\"12.5\" alt=\"tooltip icon\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/themes\/lightning-wp\/assets\/images\/tooltip.svg\"><span class=\"tooltip_content\">the ability to serve multiple simultaneous inference requests<\/span><\/span><\/span> and <span class=\"mui_tooltip wrapped\"><span class=\"tooltip_wrap\">throughput<img decoding=\"async\" class=\"ml-1\" width=\"12.5\" height=\"12.5\" alt=\"tooltip icon\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/themes\/lightning-wp\/assets\/images\/tooltip.svg\"><span class=\"tooltip_content\">units of data processed per unit of time<\/span><\/span><\/span> is essential for businesses to respond quickly to users and be available to handle a large number of requests. Previously, <a href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/dynamic-batching-autoscaling\/\">we&#8217;ve shown you how to scale model serving with dynamic batching and autoscaling<\/a> in order to serve Stable Diffusion and scale your performance to handle over 1000 concurrent users.<\/p>\n<p>Below, we explore how we leveraged several optimizations from PyTorch and other third-party libraries such as <a href=\"https:\/\/github.com\/microsoft\/DeepSpeed\">DeepSpeed<\/a> to reduce the cost of serving Stable Diffusion without significant impact on the quality of the images generated.<\/p>\n<p>Using the following prompts, here are some examples of the generated images before and after optimization:<\/p>\n<p style=\"text-align: center;\"><em><strong>&#8220;astronaut riding a horse, digital art, epic lighting, highly-detailed masterpiece trending HQ&#8221;<\/strong><\/em><\/p>\n<div id=\"attachment_5647173\" style=\"width: 1810px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647173\" class=\"size-full wp-image-5647173\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/Default-vs.-Optimized-images.png\" alt=\"\" width=\"1800\" height=\"1200\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/Default-vs.-Optimized-images.png 1800w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/Default-vs.-Optimized-images-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/Default-vs.-Optimized-images-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/Default-vs.-Optimized-images-1536x1024.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/Default-vs.-Optimized-images-300x200@2x.png 600w\" sizes=\"(max-width: 1800px) 100vw, 1800px\" \/><p id=\"caption-attachment-5647173\" class=\"wp-caption-text\">No significant loss in image quality after optimizing.<\/p><\/div>\n<p>As can be seen from the example above, we observed no significant change or loss in the quality of images generated despite improving inference speed by over 300%.<\/p>\n<p>We focused on optimizing the original Stable Diffusion and managed to reduce serving time from 6.4 to 2.09 seconds for batch size 1 on A10. This is one of the most powerful and cost-effective machines available on the Lightning Platform. All measurements were taken in production using this <a href=\"https:\/\/github.com\/Lightning-AI\/DiffusionWithAutoscaler\/blob\/main\/app.py\">server<\/a> and <a href=\"https:\/\/github.com\/Lightning-AI\/DiffusionWithAutoscaler\/blob\/main\/loadtest\/app.py\">load testing app<\/a>.<\/p>\n<p>(In case you&#8217;re wondering how much time these optimizations can save you, it took 19 seconds on an M1 Mac Metal GPU and 134 seconds on an M1 Mac CPU).<\/p>\n<p>&nbsp;<\/p>\n<h2>PyTorch Optimizations<\/h2>\n<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Optimization #1<\/h3> Use <code>torch.float16<\/code> instead of <code>torch.float32<\/code> with <a href=\"https:\/\/pytorch.org\/docs\/stable\/amp.html?highlight=autocast#torch.autocast\">mixed precision<\/a> from PyTorch.<br \/>\n<b>Result:<\/b> 40% gain in inference speed<\/div>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\nimport torch<br \/>\nfrom torch import autocast\n\nmodel = model.to(device=\"cuda\", dtype=torch.float16)\n\n# Mixed precision<br \/>\nwith autocast(\"cuda\"):<br \/>\ndata = ...<br \/>\nmodel(data.to(device=\"cuda\"))\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Optimization #2<\/h3> use <code>torch.inference_mode<\/code> (where the model achieves better performance by disabling view tracking and version counter bumps) or <code>torch.no_grad.<\/code><br \/>\n<b>Result:<\/b> &lt;1% gain in inference speed<\/div>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\nimport torch<br \/>\nfrom torch import inference_mode, no_grad\n\nmodel = model.to(device=\"cuda\", dtype=torch.float16)\n\n# Inference mode<br \/>\nwith inference_mode():<br \/>\nwith autocast(\"cuda\"):<br \/>\ndata = ...<br \/>\nmodel(data.to(device=\"cuda\"))\n\n# No gradients mode<br \/>\nwith no_grad():<br \/>\nwith autocast(\"cuda\"):<br \/>\ndata = ...<br \/>\nmodel(data.to(device=\"cuda\"))\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Optimization #3<\/h3> Use CUDA Graphs.<br \/>\n<b>Result:<\/b> 5% gain in inference speed<\/p>\n<p>In this technique, the graph of operations is captured and replayed at once, rather than in a sequence of individually-launched operations. This reduces overhead as GPU kernels are not returning back to Python.<\/div>\n<div id=\"attachment_5647195\" style=\"width: 1810px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647195\" class=\"size-full wp-image-5647195\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/benefits-of-CUDA-graphs-1.png\" alt=\"\" width=\"1800\" height=\"1200\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/benefits-of-CUDA-graphs-1.png 1800w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/benefits-of-CUDA-graphs-1-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/benefits-of-CUDA-graphs-1-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/benefits-of-CUDA-graphs-1-1536x1024.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/benefits-of-CUDA-graphs-1-300x200@2x.png 600w\" sizes=\"(max-width: 1800px) 100vw, 1800px\" \/><p id=\"caption-attachment-5647195\" class=\"wp-caption-text\">Benefits of using CUDA graphs.<\/p><\/div>\n<p>&nbsp;<\/p>\n<p>If you&#8217;ve used TensorFlow before, this should look very familiar. We created placeholders and captured the static graph applied to them. In order to re-evaluate, you would need to copy the data in the placeholder.<\/p>\n<p>Here&#8217;s how this works in 2 steps:<\/p>\n<p>&nbsp;<\/p>\n<h3>Step 1: Capture the PyTorch operations<\/h3>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\n# 1. Placeholders inputs used for capture<br \/>\nplaceholder_input = torch.randn(N, D_in, device='cuda')\n\n# 2. Capture operations<br \/>\ng = torch.cuda.CUDAGraph()<br \/>\nwith torch.cuda.graph(g):<br \/>\n#\u00a0some torch operations<br \/>\nplaceholder_output = fn(static_input)\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<h3>Step 2: Replay the graph<\/h3>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\nreal_input = torch.rand_like(static_input)\n\nstatic_input.copy_(data)<br \/>\ng.replay()<br \/>\nprint(placeholder_output)\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<p>We applied this mechanism to the Clip Text Encoder, the U-Net, and VAE portions of the model. To learn more about the architecture of Stable Diffusion, <a href=\"https:\/\/jalammar.github.io\/illustrated-stable-diffusion\/\">you can read more in this article<\/a>.<\/p>\n<div id=\"attachment_5647196\" style=\"width: 1810px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647196\" class=\"size-full wp-image-5647196\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time.png\" alt=\"\" width=\"1800\" height=\"1200\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time.png 1800w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-1536x1024.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-300x200@2x.png 600w\" sizes=\"(max-width: 1800px) 100vw, 1800px\" \/><p id=\"caption-attachment-5647196\" class=\"wp-caption-text\">Inference speed with various optimizations. (warmed up over 5 inferences, results averaged over 10 inferences)<\/p><\/div>\n<p>&nbsp;<\/p>\n<h2>DeepSpeed Inference<\/h2>\n<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">DeepSpeed Inference<\/h3> Using DeepSpeed Inference introduces several features to efficiently serve transformer-based PyTorch models with custom fused GPU kernels.<br \/>\n<b>Result:<\/b> 44% gain in inference speed<\/p>\n<p>Learn more with <a href=\"https:\/\/www.deepspeed.ai\/tutorials\/inference-tutorial\/\">this tutorial<\/a>.<\/div>\n<p>&nbsp;<\/p>\n<p>We use DeepSpeed inference as follows:<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\nimport deepspeed\n\nmodel = ...\n\n# Initialize the DeepSpeed-Inference engine<br \/>\nds_engine = deepspeed.init_inference(model, dtype=torch.half)\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<p>Behind the scenes, DeepSpeed Inference replaces any layers with their optimized versions if they match DeepSpeed internal registered layers. For example, only models from HuggingFace or Timm are already pre-registered and supported out-of-the-box by DeepSpeed Inference.<\/p>\n<p>Because we&#8217;re using Stable Diffusion directly from <a href=\"https:\/\/github.com\/CompVis\/stable-diffusion\">its GitHub repo<\/a>, we first need to replace the layers using the DeepSpeed optimized Transformer Layer.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\nfrom ldm.modules.attention import CrossAttention, BasicTransformerBlock\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<p>First, we replace <code>CrossAttention<\/code> from Stable Diffusion with DeepSpeed <code>DeepSpeedDiffusersAttention<\/code>. Here&#8217;s the code to do so:<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\nfrom deepspeed.ops.transformer.inference.diffusers_attention import DeepSpeedDiffusersAttention<br \/>\nimport deepspeed.ops.transformer as transformer_inference\n\ndef replace_attn(child, policy):<br \/>\npolicy_attn = policy.attention(child)<br \/>\nqkvw, attn_ow, attn_ob, hidden_size, heads = policy_attn\n\nconfig = transformer_inference.DeepSpeedInferenceConfig(<br \/>\nhidden_size=hidden_size,<br \/>\nheads=heads,<br \/>\nfp16=fp16,<br \/>\ntriangular_masking=False,<br \/>\nmax_out_tokens=4096,<br \/>\n)<br \/>\nattn_module = DeepSpeedDiffusersAttention(config)\n\ndef transpose(data):<br \/>\ndata = data.contiguous()<br \/>\ndata.reshape(-1).copy_(data.transpose(-1, -2).contiguous().reshape(-1))<br \/>\ndata = data.reshape(data.shape[-1], data.shape[-2])<br \/>\ndata.to(torch.cuda.current_device())<br \/>\nreturn data\n\nattn_module.attn_qkvw.data = transpose(qkvw.data)\n\nattn_module.attn_qkvb = None<br \/>\nattn_module.attn_ow.data = transpose(attn_ow.data)<br \/>\nattn_module.attn_ob.data.copy_(attn_ob.data.to(torch.cuda.current_device()))<br \/>\nreturn attn_module\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<p>Next, we replace the <code>BasicTransformerBlock<\/code> from Stable Diffusion with DeepSpeed <code>DeepSpeedDiffusersTransformerBlock<\/code>. Again, here&#8217;s the code to do so:<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\nfrom deepspeed.ops.transformer.inference.diffusers_transformer_block import DeepSpeedDiffusersTransformerBloc\n\ndef replace_attn_block(child, policy):<br \/>\nconfig = Diffusers2DTransformerConfig()<br \/>\nreturn DeepSpeedDiffusersTransformerBlock(child, config)\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<h2>Benchmarking<\/h2>\n<p>After performing these various optimizations, we visualized our results:<\/p>\n<div id=\"attachment_5647200\" style=\"width: 1210px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647200\" class=\"wp-image-5647200 size-full\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-2.png\" alt=\"\" width=\"1200\" height=\"800\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-2.png 1200w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-2-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-2-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/inference-time-2-300x200@2x.png 600w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><p id=\"caption-attachment-5647200\" class=\"wp-caption-text\">Improvements in inference time (seconds)<\/p><\/div>\n<p>&nbsp;<\/p>\n<div id=\"attachment_5647201\" style=\"width: 1210px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647201\" class=\"size-full wp-image-5647201\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/optimization-table.png\" alt=\"\" width=\"1200\" height=\"800\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/optimization-table.png 1200w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/optimization-table-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/optimization-table-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/optimization-table-300x200@2x.png 600w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><p id=\"caption-attachment-5647201\" class=\"wp-caption-text\">Optimization chart<\/p><\/div>\n<p>&nbsp;<\/p>\n<h2>Batching in Practice<\/h2>\n<p>Because CUDA graphs don&#8217;t support dynamic batch sizes, we didn&#8217;t account for these when we benchmarked across various batch sizes.<\/p>\n<p>Here are the optimizations we performance according to batch size:<\/p>\n<div id=\"attachment_5647202\" style=\"width: 1210px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5647202\" class=\"size-full wp-image-5647202\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/batching-table.png\" alt=\"\" width=\"1200\" height=\"800\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/batching-table.png 1200w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/batching-table-300x200.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/batching-table-1024x683.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/batching-table-300x200@2x.png 600w\" sizes=\"(max-width: 1200px) 100vw, 1200px\" \/><p id=\"caption-attachment-5647202\" class=\"wp-caption-text\">Optimizations according to batch size<\/p><\/div>\n<p>These optimizations resulted in further inference speed improvements at larger batch sizes.<\/p>\n<p>&nbsp;<\/p>\n<h2>Conclusion<\/h2>\n<p>In this blog post, you learned how we leveraged several optimizations from PyTorch and DeepSpeed Inference to improve inference speed by over 300%.<\/p>\n<p>In the future, we&#8217;d love to explore new ideas to even further improve inference time, such as dynamic batching on the U-Net or operators trace optimization. If you want to stay in the know about our latest improvements, join us on <a href=\"https:\/\/discord.gg\/XncpTy7DSt\">Discord<\/a> or our <a href=\"https:\/\/lightning.ai\/forums\/\">Forums<\/a>!<\/p>\n<p>&nbsp;<\/p>\n<h2>Benchmark this yourself!<\/h2>\n<p>To run your own benchmarks using these optimizations, just follow these three simple steps:<\/p>\n<ol>\n<li><a href=\"https:\/\/lightning.ai\/\">Create a Lightning AI account<\/a> and receive $30USD worth of free credits<\/li>\n<li>Duplicate (fork) our <a href=\"https:\/\/lightning.ai\/app\/fcUubSZ99Q\">Autoscale Stable Diffusion Server<\/a> on your account<\/li>\n<li>Navigate to our <a href=\"https:\/\/github.com\/Lightning-AI\/DiffusionWithAutoscaler\">GitHub repo<\/a> to replicate the benchmark.<\/li>\n<\/ol>\n<a target=\"blank\" href=\"https:\/\/lightning.ai\" class=\"d-inline-block btn btn-blue\">Sign up for Lightning AI!<\/a>\n","protected":false},"excerpt":{"rendered":"<p>Serving large models in production with high and is essential for businesses to respond quickly to users and be available to handle a large number of requests. Previously, we&#8217;ve shown you how to scale model serving with dynamic batching and autoscaling in order to serve Stable Diffusion and scale your performance to handle over 1000<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\" title=\"ReadServe Stable Diffusion Three Times Faster\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":38,"featured_media":5647204,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[106,41],"tags":[96,48,97,51,114],"glossary":[],"acf":{"additional_authors":false,"hide_from_archive":false,"content_type":"Blog Post","custom_styles":"","mathjax":false,"default_editor":true,"show_table_of_contents":false,"sticky":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Serve Stable Diffusion Three Times Faster<\/title>\n<meta name=\"description\" content=\"Learn how to leverage several optimizations from PyTorch and DeepSpeed to serve Stable Diffusion up to three times faster.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Serve Stable Diffusion Three Times Faster\" \/>\n<meta property=\"og:description\" content=\"Learn how to leverage several optimizations from PyTorch and DeepSpeed to serve Stable Diffusion up to three times faster.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2023-01-24T21:33:26+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-28T15:27:39+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1160\" \/>\n\t<meta property=\"og:image:height\" content=\"600\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Thomas Chaton\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Thomas Chaton\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\"},\"author\":{\"name\":\"Thomas Chaton\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/a5c2133ac25a788147b115979a5fc2bf\"},\"headline\":\"Serve Stable Diffusion Three Times Faster\",\"datePublished\":\"2023-01-24T21:33:26+00:00\",\"dateModified\":\"2023-07-28T15:27:39+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\"},\"wordCount\":1177,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png\",\"keywords\":[\"ai\",\"DeepSpeed\",\"ml\",\"pytorch\",\"stable diffusion\"],\"articleSection\":[\"Community\",\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\",\"name\":\"Serve Stable Diffusion Three Times Faster\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png\",\"datePublished\":\"2023-01-24T21:33:26+00:00\",\"dateModified\":\"2023-07-28T15:27:39+00:00\",\"description\":\"Learn how to leverage several optimizations from PyTorch and DeepSpeed to serve Stable Diffusion up to three times faster.\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png\",\"width\":1160,\"height\":600},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Serve Stable Diffusion Three Times Faster\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/a5c2133ac25a788147b115979a5fc2bf\",\"name\":\"Thomas Chaton\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/e8a8ea2ae1fd0f2d476f8bc75e195b3d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/e8a8ea2ae1fd0f2d476f8bc75e195b3d?s=96&d=mm&r=g\",\"caption\":\"Thomas Chaton\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/thomaschaton\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Serve Stable Diffusion Three Times Faster","description":"Learn how to leverage several optimizations from PyTorch and DeepSpeed to serve Stable Diffusion up to three times faster.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/","og_locale":"en_US","og_type":"article","og_title":"Serve Stable Diffusion Three Times Faster","og_description":"Learn how to leverage several optimizations from PyTorch and DeepSpeed to serve Stable Diffusion up to three times faster.","og_url":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/","og_site_name":"Lightning AI","article_published_time":"2023-01-24T21:33:26+00:00","article_modified_time":"2023-07-28T15:27:39+00:00","og_image":[{"width":1160,"height":600,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png","type":"image\/png"}],"author":"Thomas Chaton","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"Thomas Chaton","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/"},"author":{"name":"Thomas Chaton","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/a5c2133ac25a788147b115979a5fc2bf"},"headline":"Serve Stable Diffusion Three Times Faster","datePublished":"2023-01-24T21:33:26+00:00","dateModified":"2023-07-28T15:27:39+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/"},"wordCount":1177,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png","keywords":["ai","DeepSpeed","ml","pytorch","stable diffusion"],"articleSection":["Community","Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/","url":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/","name":"Serve Stable Diffusion Three Times Faster","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png","datePublished":"2023-01-24T21:33:26+00:00","dateModified":"2023-07-28T15:27:39+00:00","description":"Learn how to leverage several optimizations from PyTorch and DeepSpeed to serve Stable Diffusion up to three times faster.","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/01\/SDServe-social.png","width":1160,"height":600},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/serve-stable-diffusion-three-times-faster\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Serve Stable Diffusion Three Times Faster"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/a5c2133ac25a788147b115979a5fc2bf","name":"Thomas Chaton","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/e8a8ea2ae1fd0f2d476f8bc75e195b3d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/e8a8ea2ae1fd0f2d476f8bc75e195b3d?s=96&d=mm&r=g","caption":"Thomas Chaton"},"url":"https:\/\/lightning.ai\/pages\/author\/thomaschaton\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647172"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/38"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5647172"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647172\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5647204"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5647172"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5647172"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5647172"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5647172"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}