BaguaStrategy¶
- class pytorch_lightning.strategies.BaguaStrategy(algorithm='gradient_allreduce', flatten=True, accelerator=None, parallel_devices=None, cluster_environment=None, checkpoint_io=None, precision_plugin=None, **bagua_kwargs)[source]¶
Bases:
pytorch_lightning.strategies.ddp.DDPStrategy
Strategy for training using the Bagua library, with advanced distributed training algorithms and system optimizations.
This strategy requires the bagua package to be installed. See installation guide for more information.
The
BaguaStrategy
is only supported on GPU and on Linux systems.- Parameters
algorithm¶ (
str
) – Distributed algorithm used to do the actual communication and update. Built-in algorithms include “gradient_allreduce”, “bytegrad”, “decentralized”, “low_precision_decentralized”, “qadam” and “async”.flatten¶ (
bool
) – Whether to flatten the Bagua communication buckets. The flatten operation will reset data pointer of bucket tensors so that they can use faster code paths.bagua_kwargs¶ (
Union
[Any
,Dict
[str
,Any
]]) – Additional keyword arguments that will be passed to initialize the Bagua algorithm. More details on keyword arguments accepted for each algorithm can be found in the documentation.
- barrier(*args, **kwargs)[source]¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- reduce(tensor, group=None, reduce_op='mean')[source]¶
Reduces a tensor from several distributed processes to one aggregated tensor.
- Parameters
- Return type
- Returns
The reduced value, except when the input was not a tensor the output remains is unchanged.
- teardown()[source]¶
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type
- property lightning_module: pytorch_lightning.core.lightning.LightningModule¶
Returns the pure LightningModule without potential wrappers.
- Return type