DataParallelStrategy¶

class lightning_fabric.strategies.DataParallelStrategy(accelerator=None, parallel_devices=None, checkpoint_io=None, precision=None)[source]¶

Bases: lightning_fabric.strategies.parallel.ParallelStrategy

Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data.

all_reduce(collection, group=None, reduce_op='mean')[source]¶

Reduces the given tensor (e.g. across GPUs/processes).

Parameters

tensor¶ – the tensor to sync and reduce
group¶ (Optional[Any]) – the process group to reduce
reduce_op¶ (Union[ReduceOp, str, None]) – the reduction operation. Defaults to ‘mean’. Can also be a string ‘sum’ or ReduceOp.

Return type

TypeVar(TReduce)

barrier(*args, **kwargs)[source]¶

Synchronizes all processes which blocks processes until the whole group enters this function.

Parameters: name¶ – an optional name to pass into barrier.
Return type: None

batch_to_device(batch, device=None)[source]¶

Moves the batch to the correct device.

The returned batch is of the same type as the input batch, just having all tensors on the correct device.

Parameters

batch¶ (Any) – The batch of samples to move to the correct device
device¶ (Optional[device]) – The target device

Return type

Any

broadcast(obj, src=0)[source]¶

Broadcasts an object to all processes.

Parameters

obj¶ (TypeVar(TBroadcast)) – the object to broadcast
src¶ (int) – source rank

Return type

TypeVar(TBroadcast)

module_to_device(module)[source]¶

Moves the model to the correct device.

Return type: None

reduce_boolean_decision(decision, all=True)[source]¶

Reduces a boolean decision over distributed processes. By default is analagous to all from the standard library, returning True only if all input decisions evaluate to True. If all is set to False, it behaves like any instead.

Parameters

decision¶ (bool) – A single input decision.
all¶ (bool) – Whether to logically emulate all or any. Defaults to True.

Returns

The reduced boolean decision.

Return type

bool

setup_module(module)[source]¶

Wraps the given model into a DataParallel module.

Return type: DataParallel

property distributed_sampler_kwargs: None¶

Arguments for the DistributedSampler.

If this method is not defined, or it returns None, then the DistributedSampler will not be used.

Return type: None

property root_device: torch.device¶

Returns the root device.

Return type: device