thunder.plugins.DDP

class thunder.plugins.DDP(bucket_size_in_mb=25.0, broadcast_from=None, process_group=None)[source]

Bases: Plugin

Plugin for enabling Distributed Data Parallel (DDP) training in Thunder.

This plugin applies the necessary transforms to bucket and synchronize gradients across multiple processes, using a specified process group for communication.

See https://github.com/pytorch/pytorch/blob/v2.7.0/torch/nn/parallel/distributed.py#L326 for more details.

Parameters:
  • bucket_size_in_mb (float) – float, default 25.0 Size in megabytes of the gradient bucket in DDP.

  • broadcast_from (Optional[int]) – int | None, default None Global rank ID to broadcast model parameters from at initialization. If None, no explicit broadcast is performed.

  • process_group – Optional[ProcessGroup], default is the current default process group

  • bucket_size_in_mb (float) –

  • broadcast_from (int | None) –

__init__(bucket_size_in_mb=25.0, broadcast_from=None, process_group=None)[source]
Parameters:
  • bucket_size_in_mb (float) –

  • broadcast_from (Union[None, int]) –

Methods

__init__([bucket_size_in_mb, ...])

setup_executors()

rtype:

Optional[list[Executor]]

setup_lookasides()

rtype:

Optional[list[Lookaside]]

setup_transforms()

Constructs the list of graph-level transforms.

Attributes

policy

setup_transforms()[source]

Constructs the list of graph-level transforms.

Returns:

list[Transform] contains the DDP transform over the process group.