DDPPlugin¶
- class pytorch_lightning.plugins.training_type.DDPPlugin(parallel_devices=None, num_nodes=None, cluster_environment=None, checkpoint_io=None, sync_batchnorm=None, ddp_comm_state=None, ddp_comm_hook=None, ddp_comm_wrapper=None, model_averaging_period=None, **kwargs)[source]¶
Bases:
pytorch_lightning.plugins.training_type.parallel.ParallelPlugin
Plugin for multi-process single-device training on one or multiple nodes.
The master process in each node spawns N-1 child processes via
subprocess.Popen()
, where N is the number of devices (e.g. GPU) per node. It is very similar to howtorch.distributed.launch
launches processes.- barrier(*args, **kwargs)[source]¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- post_dispatch(trainer)[source]¶
Hook to do something after the training/evaluation/prediction finishes.
- Return type
- reduce(tensor, group=None, reduce_op='mean')[source]¶
Reduces a tensor from several distributed processes to one aggregated tensor.
- Parameters
- Return type
- Returns
reduced value, except when the input was not a tensor the output remains is unchanged
- setup_environment()[source]¶
Setup any processes or distributed connections.
This is called before the LightningModule/DataModule setup hook which allows the user to access the accelerator environment before setup is complete.
- Return type
- teardown()[source]¶
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type
- property root_device: torch.device¶
Return the root device.