thunder.plugins.DDP¶

class thunder.plugins.DDP(bucket_size_in_mb=25.0, broadcast_from=None, process_group=None)[source]¶

Bases: Plugin

Plugin for enabling Distributed Data Parallel (DDP) training in Thunder.

This plugin applies the necessary transforms to bucket and synchronize gradients across multiple processes, using a specified process group for communication.

See https://github.com/pytorch/pytorch/blob/v2.7.0/torch/nn/parallel/distributed.py#L326 for more details.

Parameters:

bucket_size_in_mb¶ (float) – float, default 25.0 Size in megabytes of the gradient bucket in DDP.
broadcast_from¶ (Optional[int]) – int | None, default None Global rank ID to broadcast model parameters from at initialization. If None, no explicit broadcast is performed.
process_group¶ – Optional[ProcessGroup], default is the current default process group
bucket_size_in_mb (float) –
broadcast_from (int | None) –

__init__(bucket_size_in_mb=25.0, broadcast_from=None, process_group=None)[source]¶

Parameters:

bucket_size_in_mb (float) –
broadcast_from (Union[None, int]) –

Methods

`__init__`([bucket_size_in_mb, ...])
`setup_executors`()	rtype: `Optional`[`list`[`Executor`]]
`setup_lookasides`()	rtype: `Optional`[`list`[`Lookaside`]]
`setup_transforms`()	Constructs the list of graph-level transforms.

Attributes

policy

setup_transforms()[source]¶

Constructs the list of graph-level transforms.

Returns:: list[Transform] contains the DDP transform over the process group.