DDPSpawnPlugin¶
- class pytorch_lightning.plugins.training_type.DDPSpawnPlugin(parallel_devices=None, num_nodes=None, cluster_environment=None, checkpoint_io=None, sync_batchnorm=None, ddp_comm_state=None, ddp_comm_hook=None, ddp_comm_wrapper=None, **kwargs)[source]¶
Bases:
pytorch_lightning.plugins.training_type.parallel.ParallelPlugin
Spawns processes using the
torch.multiprocessing.spawn()
method and joins processes after training finishes.- add_to_queue(trainer, queue)[source]¶
Appends the
trainer.callback_metrics
dictionary to the given queue. To avoid issues with memory sharing, we cast the data to numpy.
- barrier(*args, **kwargs)[source]¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- get_from_queue(trainer, queue)[source]¶
Retrieve the
trainer.callback_metrics
dictionary from the given queue. To preserve consistency, we cast back the data totorch.Tensor
.
- post_dispatch(trainer)[source]¶
Hook to do something after the training/evaluation/prediction finishes.
- reduce(tensor, group=None, reduce_op='mean')[source]¶
Reduces a tensor from several distributed processes to one aggregated tensor.
- Parameters
- Return type
- Returns
reduced value, except when the input was not a tensor the output remains is unchanged
- spawn(function, *args, return_result=True, **kwargs)[source]¶
Spawn processes that run the given function.
- Parameters
function¶ (
Callable
) – The function to spawn processes from.*args¶ (
Any
) – Optional positional arguments that will be passed to the function in addition to the process index. These arguments must be pickleable.return_result¶ (
bool
) – IfTrue
, copies the output of the function from process 0 to the main process and returns it.**kwargs¶ (
Any
) – Optional named arguments that will be passed to the function in addition to the process index. These arguments must be pickleable.
- Return type
- Returns
The output of the function of process 0.
- teardown()[source]¶
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type
- property root_device¶
Return the root device.