DDPShardedStrategy¶
- class pytorch_lightning.strategies.DDPShardedStrategy(accelerator=None, parallel_devices=None, cluster_environment=None, checkpoint_io=None, precision_plugin=None, ddp_comm_state=None, ddp_comm_hook=None, ddp_comm_wrapper=None, model_averaging_period=None, process_group_backend=None, **kwargs)[source]¶
Bases:
pytorch_lightning.strategies.ddp.DDPStrategy
Optimizer and gradient sharded training provided by FairScale.
- block_backward_sync()[source]¶
Blocks syncing gradients behaviour on backwards pass.
This is useful for skipping sync when accumulating gradients, reducing communication overhead Returns: context manager with sync behaviour off
- Return type
- optimizer_state(optimizer)[source]¶
Returns state of an optimizer.
Allows for syncing/collating optimizer state from processes in custom plugins.
- property lightning_module: Optional[pytorch_lightning.core.lightning.LightningModule]¶
Returns the pure LightningModule without potential wrappers.
- Return type