Shortcuts

DataParallelStrategy

class lightning_fabric.strategies.DataParallelStrategy(accelerator=None, parallel_devices=None, checkpoint_io=None, precision=None)[source]

Bases: lightning_fabric.strategies.parallel.ParallelStrategy

Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data.

all_reduce(collection, group=None, reduce_op='mean')[source]

Reduces the given tensor (e.g. across GPUs/processes).

Parameters
  • tensor – the tensor to sync and reduce

  • group (Optional[Any]) – the process group to reduce

  • reduce_op (Union[ReduceOp, str, None]) – the reduction operation. Defaults to ‘mean’. Can also be a string ‘sum’ or ReduceOp.

Return type

TypeVar(TReduce)

barrier(*args, **kwargs)[source]

Synchronizes all processes which blocks processes until the whole group enters this function.

Parameters

name – an optional name to pass into barrier.

Return type

None

batch_to_device(batch, device=None)[source]

Moves the batch to the correct device.

The returned batch is of the same type as the input batch, just having all tensors on the correct device.

Parameters
  • batch (Any) – The batch of samples to move to the correct device

  • device (Optional[device]) – The target device

Return type

Any

broadcast(obj, src=0)[source]

Broadcasts an object to all processes.

Parameters
  • obj (TypeVar(TBroadcast)) – the object to broadcast

  • src (int) – source rank

Return type

TypeVar(TBroadcast)

module_to_device(module)[source]

Moves the model to the correct device.

Return type

None

reduce_boolean_decision(decision, all=True)[source]

Reduces a boolean decision over distributed processes. By default is analagous to all from the standard library, returning True only if all input decisions evaluate to True. If all is set to False, it behaves like any instead.

Parameters
  • decision (bool) – A single input decision.

  • all (bool) – Whether to logically emulate all or any. Defaults to True.

Returns

The reduced boolean decision.

Return type

bool

setup_module(module)[source]

Wraps the given model into a DataParallel module.

Return type

DataParallel

property distributed_sampler_kwargs: None

Arguments for the DistributedSampler.

If this method is not defined, or it returns None, then the DistributedSampler will not be used.

Return type

None

property root_device: torch.device

Returns the root device.

Return type

device


© Copyright Copyright (c) 2018-2023, Lightning AI et al...

Built with Sphinx using a theme provided by Read the Docs.