Shortcuts

LSFEnvironment

class lightning.pytorch.plugins.environments.LSFEnvironment[source]

Bases: lightning.fabric.plugins.environments.cluster_environment.ClusterEnvironment

An environment for running on clusters managed by the LSF resource manager.

It is expected that any execution using this ClusterEnvironment was executed using the Job Step Manager i.e. jsrun.

This plugin expects the following environment variables:

LSB_JOBID

The LSF assigned job ID

LSB_DJOB_RANKFILE

The OpenMPI compatible rank file for the LSF job

JSM_NAMESPACE_LOCAL_RANK

The node local rank for the task. This environment variable is set by jsrun

JSM_NAMESPACE_SIZE

The world size for the task. This environment variable is set by jsrun

JSM_NAMESPACE_RANK

The global rank for the task. This environment variable is set by jsrun

static detect()[source]

Returns True if the current process was launched using the jsrun command.

Return type

bool

global_rank()[source]

The world size is read from the environment variable JSM_NAMESPACE_RANK.

Return type

int

local_rank()[source]

The local rank is read from the environment variable JSM_NAMESPACE_LOCAL_RANK.

Return type

int

node_rank()[source]

The node rank is determined by the position of the current hostname in the OpenMPI host rank file stored in LSB_DJOB_RANKFILE.

Return type

int

world_size()[source]

The world size is read from the environment variable JSM_NAMESPACE_SIZE.

Return type

int

property creates_processes_externally: bool

LSF creates subprocesses, i.e., PyTorch Lightning does not need to spawn them.

Return type

bool

property main_address: str

The main address is read from an OpenMPI host rank file in the environment variable LSB_DJOB_RANKFILE.

Return type

str

property main_port: int

The main port is calculated from the LSF job ID.

Return type

int