Shortcuts

LightningEnvironment

class lightning.pytorch.plugins.environments.LightningEnvironment[source]

Bases: lightning.fabric.plugins.environments.cluster_environment.ClusterEnvironment

The default environment used by Lightning for a single node or free cluster (not managed).

There are two modes the Lightning environment can operate with:

  1. The user only launches the main process by python train.py ... with no additional environment variables set. Lightning will spawn new worker processes for distributed training in the current node.

  2. The user launches all processes manually or with utilities like torch.distributed.launch. The appropriate environment variables need to be set, and at minimum LOCAL_RANK.

If the main address and port are not provided, the default environment will choose them automatically. It is recommended to use this default environment for single-node distributed training as it provides a convenient way to launch the training script.

static detect()[source]

Detects the environment settings corresponding to this cluster and returns True if they match.

Return type

bool

global_rank()[source]

The rank (index) of the currently running process across all nodes and devices.

Return type

int

local_rank()[source]

The rank (index) of the currently running process inside of the current node.

Return type

int

node_rank()[source]

The rank (index) of the node on which the current process runs.

Return type

int

teardown()[source]

Clean up any state set after execution finishes.

Return type

None

world_size()[source]

The number of processes across all devices and nodes.

Return type

int

property creates_processes_externally: bool

Returns whether the cluster creates the processes or not.

If at least LOCAL_RANK is available as environment variable, Lightning assumes the user acts as the process launcher/job scheduler and Lightning will not launch new processes.

Return type

bool

property main_address: str

The main address through which all processes connect and communicate.

Return type

str

property main_port: int

An open and configured port in the main node through which all processes communicate.

Return type

int