What is a Strategy?¶
Strategy controls the model distribution across training, evaluation, and prediction to be used by the Trainer. It can be controlled by passing different
strategy with aliases ("ddp"
, "ddp_spawn"
, "deepspeed"
and so on) as well as a custom strategy to the strategy
parameter for Trainer.
The Strategy in PyTorch Lightning handles the following responsibilities:
Launch and teardown of training processes (if applicable).
Setup communication between processes (NCCL, GLOO, MPI, and so on).
Provide a unified communication interface for reduction, broadcast, and so on.
Owns the
LightningModule
Handles/owns optimizers and schedulers.
Strategy is a composition of one Accelerator, one Precision Plugin, a CheckpointIO plugin and other optional plugins such as the ClusterEnvironment.
![Illustration of the Strategy as a composition of the Accelerator and several plugins](https://pl-public-data.s3.amazonaws.com/docs/static/images/strategies/overview.jpeg)
We expose Strategies mainly for expert users that want to extend Lightning for new hardware support or new distributed backends (e.g. a backend not yet supported by PyTorch itself).
Selecting a Built-in Strategy¶
Built-in strategies can be selected in two ways.
Pass the shorthand name to the
strategy
Trainer argumentImport a Strategy from
pytorch_lightning.strategies
, instantiate it and pass it to thestrategy
Trainer argument
The latter allows you to configure further options on the specifc strategy. Here are some examples:
# Training with the DistributedDataParallel strategy on 4 GPUs
trainer = Trainer(strategy="ddp", accelerator="gpu", devices=4)
# Training with the DistributedDataParallel strategy on 4 GPUs, with options configured
trainer = Trainer(strategy=DDPStrategy(find_unused_parameters=False), accelerator="gpu", devices=4)
# Training with the DDP Spawn strategy using auto accelerator selection
trainer = Trainer(strategy="ddp_spawn", accelerator="auto", devices=4)
# Training with the DeepSpeed strategy on available GPUs
trainer = Trainer(strategy="deepspeed", accelerator="gpu", devices="auto")
# Training with the DDP strategy using 3 CPU processes
trainer = Trainer(strategy="ddp", accelerator="cpu", devices=3)
# Training with the DDP Spawn strategy on 8 TPU cores
trainer = Trainer(strategy="ddp_spawn", accelerator="tpu", devices=8)
# Training with the default IPU strategy on 8 IPUs
trainer = Trainer(accelerator="ipu", devices=8)
The below table lists all relevant strategies available in Lightning with their corresponding short-hand name:
Name |
Class |
Description |
---|---|---|
bagua |
Strategy for training using the Bagua library, with advanced distributed training algorithms and system optimizations. Learn more. |
|
collaborative |
Strategy for training collaboratively on local machines or unreliable GPUs across the internet. Learn more. |
|
fsdp |
Strategy for Fully Sharded Data Parallel provided by FairScale. Learn more. |
|
ddp_sharded |
Optimizer and gradient sharded training provided by FairScale. Learn more. |
|
ddp_sharded_spawn |
Optimizer sharded training provided by FairScale. Learn more. |
|
ddp_spawn |
Spawns processes using the |
|
ddp |
Strategy for multi-process single-device training on one or multiple nodes. Learn more. |
|
dp |
Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data. Learn more. |
|
deepspeed |
Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. Learn more. |
|
horovod |
Strategy for Horovod distributed training integration. Learn more. |
|
hpu_parallel |
Strategy for distributed training on multiple HPU devices. Learn more. |
|
hpu_single |
Strategy for training on a single HPU device. Learn more. |
|
ipu_strategy |
Plugin for training on IPU devices. Learn more. |
|
tpu_spawn |
Strategy for training on multiple TPU devices using the |
|
single_tpu |
Strategy for training on a single TPU device. Learn more. |
Create a Custom Strategy¶
Every strategy in Lightning is a subclass of one of the main base classes: Strategy
, SingleDeviceStrategy
or ParallelStrategy
.
![Strategy base classes](https://pl-public-data.s3.amazonaws.com/docs/static/images/strategies/hierarchy.jpeg)
As an expert user, you may choose to extend either an existing built-in Strategy or create a completely new one by subclassing the base classes.
from pytorch_lightning.strategies import DDPStrategy
class CustomDDPStrategy(DDPStrategy):
def configure_ddp(self):
self.model = MyCustomDistributedDataParallel(
self.model,
device_ids=...,
)
def setup(self, trainer):
# you can access the accelerator and plugins directly
self.accelerator.setup()
self.precision_plugin.connect(...)
The custom strategy can then be passed into the Trainer
directly via the strategy
parameter.
# custom strategy
trainer = Trainer(strategy=CustomDDPStrategy())
Since the strategy also hosts the Accelerator and various plugins, you can customize all of them to work together as you like:
# custom strategy, with new accelerator and plugins
accelerator = MyAccelerator()
precision_plugin = MyPrecisionPlugin()
strategy = CustomDDPStrategy(accelerator=accelerator, precision_plugin=precision_plugin)
trainer = Trainer(strategy=strategy)