Strategy¶
Strategy controls the model distribution across training, evaluation, and prediction to be used by the Trainer. It can be controlled by passing different
strategy with aliases ("ddp"
, "ddp_spawn"
, "deepspeed"
and so on) as well as a custom strategy to the strategy
parameter for Trainer.
The Strategy in PyTorch Lightning handles the following responsibilities:
Launch and teardown of training processes (if applicable).
Setup communication between processes (NCCL, GLOO, MPI, and so on).
Provide a unified communication interface for reduction, broadcast, and so on.
Owns the
LightningModule
Handles/owns optimizers and schedulers.
Strategy
also manages the accelerator, precision, and checkpointing plugins.
Training Strategies with Various Configs¶
# Training with the DistributedDataParallel strategy on 4 GPUs
trainer = Trainer(strategy="ddp", accelerator="gpu", devices=4)
# Training with the custom DistributedDataParallel strategy on 4 GPUs
trainer = Trainer(strategy=DDPStrategy(...), accelerator="gpu", devices=4)
# Training with the DDP Spawn strategy using auto accelerator selection
trainer = Trainer(strategy="ddp_spawn", accelerator="auto", devices=4)
# Training with the DeepSpeed strategy on available GPUs
trainer = Trainer(strategy="deepspeed", accelerator="gpu", devices="auto")
# Training with the DDP strategy using 3 CPU processes
trainer = Trainer(strategy="ddp", accelerator="cpu", devices=3)
# Training with the DDP Spawn strategy on 8 TPU cores
trainer = Trainer(strategy="ddp_spawn", accelerator="tpu", devices=8)
# Training with the default IPU strategy on 8 IPUs
trainer = Trainer(accelerator="ipu", devices=8)
Create a Custom Strategy¶
Expert users may choose to extend an existing strategy by overriding its methods.
from pytorch_lightning.strategies import DDPStrategy
class CustomDDPStrategy(DDPStrategy):
def configure_ddp(self):
self.model = MyCustomDistributedDataParallel(
self.model,
device_ids=...,
)
or by subclassing the base class Strategy
to create new ones. These custom strategies
can then be passed into the Trainer
directly via the strategy
parameter.
# custom plugins
trainer = Trainer(strategy=CustomDDPStrategy())
# fully custom accelerator and plugins
accelerator = MyAccelerator()
precision_plugin = MyPrecisionPlugin()
training_strategy = CustomDDPStrategy(accelerator=accelerator, precision_plugin=precision_plugin)
trainer = Trainer(strategy=training_strategy)
The complete list of built-in strategies is listed below.
Built-In Training Strategies¶
Strategy for training using the Bagua library, with advanced distributed training algorithms and system optimizations. |
|
DDP2 behaves like DP in one node, but synchronization across nodes behaves like in DDP. |
|
Plugin for Fully Sharded Data Parallel provided by FairScale. |
|
Optimizer and gradient sharded training provided by FairScale. |
|
Optimizer sharded training provided by FairScale. |
|
Spawns processes using the |
|
Strategy for multi-process single-device training on one or multiple nodes. |
|
Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data. |
|
Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. |
|
Plugin for Horovod distributed training integration. |
|
Strategy for distributed training on multiple HPU devices. |
|
Plugin for training on IPU devices. |
|
Plugin for training with multiple processes in parallel. |
|
Strategy that handles communication on a single device. |
|
Strategy for training on single HPU device. |
|
Strategy for training on a single TPU device. |
|
Base class for all strategies that change the behaviour of the training, validation and test- loop. |
|
Strategy for training multiple TPU devices using the |