DataParallelStrategy¶

class lightning_fabric.strategies.DataParallelStrategy(accelerator=None, parallel_devices=None, checkpoint_io=None, precision=None)[source]¶

Bases: lightning_fabric.strategies.parallel.ParallelStrategy

Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data.

all_reduce(collection, group=None, reduce_op='mean')[source]¶

Reduces the given tensor (e.g. across GPUs/processes).

Parameters:

tensor¶ – the tensor to sync and reduce
group¶ (Optional[Any]) – the process group to reduce
reduce_op¶ (Union[ReduceOp, str, None]) – the reduction operation. Defaults to ‘mean’. Can also be a string ‘sum’ or ReduceOp.

Return type:

TypeVar(TReduce)

barrier(*args, **kwargs)[source]¶

Synchronizes all processes which blocks processes until the whole group enters this function.

Parameters:: name¶ – an optional name to pass into barrier.
Return type:: None

batch_to_device(batch, device=None)[source]¶

Moves the batch to the correct device.

The returned batch is of the same type as the input batch, just having all tensors on the correct device.

Parameters:

batch¶ (Any) – The batch of samples to move to the correct device
device¶ (Optional[device]) – The target device

Return type:

Any

broadcast(obj, src=0)[source]¶

Broadcasts an object to all processes.

Parameters:

obj¶ (TypeVar(TBroadcast)) – the object to broadcast
src¶ (int) – source rank

Return type:

TypeVar(TBroadcast)

module_to_device(module)[source]¶

Moves the model to the correct device.

Return type:: None

reduce_boolean_decision(decision, all=True)[source]¶

Reduces a boolean decision over distributed processes. By default is analagous to all from the standard library, returning True only if all input decisions evaluate to True. If all is set to False, it behaves like any instead.

Parameters:

decision¶ (bool) – A single input decision.
all¶ (bool) – Whether to logically emulate all or any. Defaults to True.

Returns:

The reduced boolean decision.

Return type:

bool

setup_module(module)[source]¶

Wraps the given model into a DataParallel module.

Return type:: DataParallel

property distributed_sampler_kwargs: None¶

Arguments for the DistributedSampler.

If this method is not defined, or it returns None, then the DistributedSampler will not be used.

property root_device: torch.device¶: Returns the root device.