{"id":5646726,"date":"2022-10-26T15:39:42","date_gmt":"2022-10-26T19:39:42","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5646726"},"modified":"2023-03-07T17:22:32","modified_gmt":"2023-03-07T22:22:32","slug":"distributed-training-guide","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/","title":{"rendered":"Guide to Distributed Training"},"content":{"rendered":"<h2>What Is Distributed Training?<\/h2>\n<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Key takeaways<\/h3> In this tutorial, you\u2019ll learn how to scale models and data across multiple devices using distributed training. <\/div>\n<p>The GPU is the most popular choice of device for rapid deep learning research. This is a direct result of the speed, optimizations, and ease of use that these frameworks offer. From PyTorch to TensorFlow, support for GPUs is built into all of today&#8217;s major deep learning frameworks. Thankfully, running experiments on a single GPU does not currently require many changes to your code. As models continue to increase in size, however, and as the data needed to train them grows exponentially, running on a single GPU begins to pose severe limitations. Whether it&#8217;s running out of memory or dealing with slow training speeds, researchers have developed strategies to overcome the limitations posed by single-GPU training. In this tutorial, we&#8217;ll cover how to use distributed training to scale your research to multiple GPUs.<\/p>\n<p>With Lightning Trainer, scaling your research to multiple GPUs is easy. Even better &#8211; if a server full of GPUs isn&#8217;t enough, you can train on multiple servers (also called nodes) in parallel. <span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch-lightning.readthedocs.io\/en\/latest\/\">Lightning<\/a><\/span> takes care of this by abstracting away boilerplate code, leaving you to focus on the research you actually care about. Under the hood, <span style=\"color: #7345e4;\"><strong><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch-lightning.readthedocs.io\/en\/latest\/\">Lightning<\/a><\/strong><\/span> is modular, meaning it can adapt to whatever environment you are running in (for example, a multi-GPU cluster).<\/p>\n<p>Below, we provide a theoretical overview of distributed deep learning, and then cover how Distributed Data Parallel (DDP) works internally.<\/p>\n<h2 style=\"text-align: center;\">\u00b7 \u00b7 \u00b7<\/h2>\n<h2>When Do I Need Distributed Training?<\/h2>\n<p>Distributed training is a method that enables you to scale models and data to multiple devices for parallel execution. It generally yields a linear increase in speed that grows according to the number of GPUs involved.<\/p>\n<p>Distributed training is useful when you:<\/p>\n<ul>\n<li>Need to speed up training because you have a large amount of data.<\/li>\n<li>Work with large batch sizes that cannot fit into the memory of a single GPU.<\/li>\n<li>Have a large model parameter count that doesn\u2019t fit into the memory of a single GPU.<\/li>\n<li>Have a stack of GPUs at your disposal. (wouldn&#8217;t that be nice?)<\/li>\n<\/ul>\n<p>The first two of these cases, speeding up training and large batch sizes, can be addressed by a DDP approach where the data is split evenly across all devices. It is the most common use of multi-GPU and multi-node training today, and is the main focus of this tutorial.<\/p>\n<p>The third case (large model parameter count) is becoming increasingly common, particularly as models like GPT-3, BERT, and Stable Diffusion grow in size exponentially. With billions of parameters, these models are too large to fit into a single multi-GPU machine. In other words, without distributed training, these models wouldn\u2019t exist.<\/p>\n<h2 style=\"text-align: center;\">\u00b7 \u00b7 \u00b7<\/h2>\n<h2>How Does Distributed Training Work?<\/h2>\n<p>In order to understand distributed training, it is essential to understand that the optimization in a distributed setting does not change when compared to a single-device setting. We minimize the same cost function with the same model and optimizer.<\/p>\n<p>The difference is that the data gets split into multiple devices, which leads to a reduced batch size per GPU. Gradient computation thus does not create any memory overhead and runs in parallel. This works because of the linearity of the gradient operator: computing the gradient for individual data samples and then averaging them is the same as computing the gradient using the whole batch of data at once on a single device.<\/p>\n<div id=\"attachment_5646743\" style=\"width: 1410px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5646743\" class=\"size-full wp-image-5646743\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Linearity.png\" alt=\"\" width=\"1400\" height=\"323\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Linearity.png 1400w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Linearity-300x69.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Linearity-1024x236.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Linearity-600x138.png 600w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Linearity-600x138@2x.png 1200w\" sizes=\"(max-width: 1400px) 100vw, 1400px\" \/><p id=\"caption-attachment-5646743\" class=\"wp-caption-text\">Linearity: The sum of the gradients computed in each node is the same as the gradient of the combined cost function computed on one node.<\/p><\/div>\n<h3><strong>Step 1<\/strong><\/h3>\n<p>We start with the same copy model weights on all devices (handled by <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch.org\/docs\/stable\/generated\/torch.nn.parallel.DistributedDataParallel.html\">PyTorch\u2019s DistributedDataParallel<\/a><\/span><\/strong>). Each device gets its split of the data batch (handled by <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch.org\/docs\/stable\/data.html#torch.utils.data.distributed.DistributedSampler\">PyTorch\u2019s DistributedSampler<\/a><\/span><\/strong>) and performs a forward pass. This yields a different loss value per device.<\/p>\n<div id=\"attachment_5646744\" style=\"width: 970px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5646744\" class=\"size-full wp-image-5646744\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Forward.png\" alt=\"\" width=\"960\" height=\"540\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Forward.png 960w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Forward-300x169.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Forward-600x338.png 600w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><p id=\"caption-attachment-5646744\" class=\"wp-caption-text\">Forward: Each device holds the same model weights but gets different data samples. Each device independently computes a loss value for that batch of data.<\/p><\/div>\n<h3><strong>Step 2<\/strong><\/h3>\n<p>Given the loss value, we can perform the backward pass, which computes the gradients of the loss with regard to the model weights. We now have a different gradient per GPU device.<\/p>\n<div id=\"attachment_5646745\" style=\"width: 970px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5646745\" class=\"wp-image-5646745 size-full\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Backward.jpg\" alt=\"\" width=\"960\" height=\"540\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Backward.jpg 960w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Backward-300x169.jpg 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Backward-600x338.jpg 600w\" sizes=\"(max-width: 960px) 100vw, 960px\" \/><p id=\"caption-attachment-5646745\" class=\"wp-caption-text\">Backward: Each device computes gradients independently. The gradients get averaged, and all devices receive the same averaged gradients for the weight update<\/p><\/div>\n<h3><strong>Step 3<\/strong><\/h3>\n<p>We synchronize the gradients by summing them up and dividing them by the number of GPU devices involved. Note that this happens during the backward pass. At the end of this process, each GPU now has the same averaged gradients.<\/p>\n<h3><strong>Step 4<\/strong><\/h3>\n<p>Finally, all models can update their weights with the synchronized gradient. Because the gradient is the same on all GPUs, we again end up with the same model weights on all devices, and the next training step can begin.<\/p>\n<h2 style=\"text-align: center;\">\u00b7 \u00b7 \u00b7<\/h2>\n<h2>Challenges with DDP<\/h2>\n<p>Splitting data evenly across multiple devices is done using the <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch.org\/docs\/stable\/data.html#torch.utils.data.distributed.DistributedSampler\">DistributedSampler<\/a><\/span><\/strong>. To balance the workload for all GPU workers and avoid synchronization issues, this strategy inserts duplicated samples if the size of the dataset is not evenly divisible by the number of GPUs and batch size. Lightning takes care of all of this automatically. However, during testing\/evaluation, duplicated samples can lead to incorrect metrics (test accuracy). Lightning is currently working on an<strong><span style=\"color: #7345e4;\"> <a style=\"color: #7345e4;\" href=\"https:\/\/github.com\/Lightning-AI\/lightning\/issues\/3325\">\u201cuneven\u201d DDP feature<\/a><\/span><\/strong> to alleviate this shortcoming in the future.<\/p>\n<p>If you need to sync metrics across devices like gradients, this creates a communication overhead that can slow down the process. If you are interested in syncing those metrics easily, you can try <span style=\"color: #7345e4;\"><strong><a style=\"color: #7345e4;\" href=\"https:\/\/torchmetrics.readthedocs.io\/en\/stable\/\">torchmetrics<\/a><\/strong><\/span>.<\/p>\n<p>If processes get stuck (a subset hangs or errors), you might end up with zombie processes and have to kill them manually. Lightning has a mechanism to detect deadlocks and will exit all processes after a specific timeout.<\/p>\n<h2>Key Points to Note<\/h2>\n<p>Only gradients are synced across devices to update the model weights. No other metrics or loss is synced by default. Although you can do that using <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch.org\/docs\/stable\/distributed.html#torch.distributed.all_reduce_multigpu\">all_reduce<\/a><\/span><\/strong>, which we will learn about in the next blog.<\/p>\n<p>DDP splits the data, not the model weights. So if your model can\u2019t be loaded by one GPU, DDP can\u2019t help here. You need to adopt more advanced strategies, such as <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/www.deepspeed.ai\/\">DeepSpeed<\/a><\/span><\/strong> or <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch.org\/tutorials\/intermediate\/FSDP_tutorial.html\">Sharding<\/a><\/span><\/strong>, which we will discuss in the following blog posts.<\/p>\n<p>We recommend using DDP over <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch.org\/docs\/stable\/generated\/torch.nn.DataParallel.html\">DataParallel<\/a><\/span><\/strong>. You can read more about why <strong><span style=\"color: #7345e4;\"><a style=\"color: #7345e4;\" href=\"https:\/\/pytorch-lightning.readthedocs.io\/en\/stable\/guides\/speed.html?highlight=DDP#prefer-ddp-over-dp\">here<\/a><\/span><\/strong>.<\/p>\n<h2>Further Reading<\/h2>\n<ol>\n<li><a href=\"https:\/\/www.youtube.com\/watch?v=3XUG7cjte2U\"><span style=\"color: #7345e4;\"><strong>PyTorch Distributed by Shen Li<\/strong><\/span><\/a> (tech lead for PyTorch Distributed team)<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>What Is Distributed Training? The GPU is the most popular choice of device for rapid deep learning research. This is a direct result of the speed, optimizations, and ease of use that these frameworks offer. From PyTorch to TensorFlow, support for GPUs is built into all of today&#8217;s major deep learning frameworks. Thankfully, running experiments<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\" title=\"ReadGuide to Distributed Training\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":5646746,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[41],"tags":[96,116,117,97,51],"glossary":[],"acf":{"additional_authors":false,"hide_from_archive":false,"content_type":"Blog Post","custom_styles":""},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Guide to Distributed Training - Lightning AI<\/title>\n<meta name=\"description\" content=\"In this tutorial, we show you how to scale your models and data to multiple GPUs and servers by using distributed training.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Guide to Distributed Training - Lightning AI\" \/>\n<meta property=\"og:description\" content=\"In this tutorial, we show you how to scale your models and data to multiple GPUs and servers by using distributed training.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2022-10-26T19:39:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-03-07T22:22:32+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1740\" \/>\n\t<meta property=\"og:image:height\" content=\"900\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"Guide to Distributed Training\",\"datePublished\":\"2022-10-26T19:39:42+00:00\",\"dateModified\":\"2023-03-07T22:22:32+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\"},\"wordCount\":1108,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png\",\"keywords\":[\"ai\",\"distributed\",\"distributed training\",\"ml\",\"pytorch\"],\"articleSection\":[\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\",\"name\":\"Guide to Distributed Training - Lightning AI\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png\",\"datePublished\":\"2022-10-26T19:39:42+00:00\",\"dateModified\":\"2023-03-07T22:22:32+00:00\",\"description\":\"In this tutorial, we show you how to scale your models and data to multiple GPUs and servers by using distributed training.\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png\",\"width\":1740,\"height\":900},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Guide to Distributed Training\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Guide to Distributed Training - Lightning AI","description":"In this tutorial, we show you how to scale your models and data to multiple GPUs and servers by using distributed training.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/","og_locale":"en_US","og_type":"article","og_title":"Guide to Distributed Training - Lightning AI","og_description":"In this tutorial, we show you how to scale your models and data to multiple GPUs and servers by using distributed training.","og_url":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/","og_site_name":"Lightning AI","article_published_time":"2022-10-26T19:39:42+00:00","article_modified_time":"2023-03-07T22:22:32+00:00","og_image":[{"width":1740,"height":900,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"Guide to Distributed Training","datePublished":"2022-10-26T19:39:42+00:00","dateModified":"2023-03-07T22:22:32+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/"},"wordCount":1108,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png","keywords":["ai","distributed","distributed training","ml","pytorch"],"articleSection":["Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/","url":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/","name":"Guide to Distributed Training - Lightning AI","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png","datePublished":"2022-10-26T19:39:42+00:00","dateModified":"2023-03-07T22:22:32+00:00","description":"In this tutorial, we show you how to scale your models and data to multiple GPUs and servers by using distributed training.","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/10\/Distributed-Training.png","width":1740,"height":900},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/distributed-training-guide\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Guide to Distributed Training"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5646726"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5646726"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5646726\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5646746"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5646726"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5646726"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5646726"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5646726"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}