lightning.fabric.plugins.environments.lsf.LSFEnvironment

class lightning.fabric.plugins.environments.lsf.LSFEnvironment[source]

Bases: ClusterEnvironment

An environment for running on clusters managed by the LSF resource manager.

It is expected that any execution using this ClusterEnvironment was executed using the Job Step Manager i.e. jsrun.

This plugin expects the following environment variables:

LSB_JOBID

The LSF assigned job ID

LSB_DJOB_RANKFILE

The OpenMPI compatible rank file for the LSF job

JSM_NAMESPACE_LOCAL_RANK

The node local rank for the task. This environment variable is set by jsrun

JSM_NAMESPACE_SIZE

The world size for the task. This environment variable is set by jsrun

JSM_NAMESPACE_RANK

The global rank for the task. This environment variable is set by jsrun

_get_main_address()[source]

A helper for getting the main address.

The main address is assigned to the first node in the list of nodes used for the job.

Return type:

str

static _get_main_port()[source]

A helper function for accessing the main port.

Uses the LSF job ID so all ranks can compute the main port.

Return type:

int

_get_node_rank()[source]

A helper method for getting the node rank.

The node rank is determined by the position of the current node in the list of hosts used in the job. This is calculated by reading all hosts from LSB_DJOB_RANKFILE and finding this node’s hostname in the list.

Return type:

int

static _read_hosts()[source]

Read compute hosts that are a part of the compute job.

LSF uses the Job Step Manager (JSM) to manage job steps. Job steps are executed by the JSM from “launch” nodes. Each job is assigned a launch node. This launch node will be the first node in the list contained in LSB_DJOB_RANKFILE.

Return type:

list[str]

static detect()[source]

Returns True if the current process was launched using the jsrun command.

Return type:

bool

global_rank()[source]

The world size is read from the environment variable JSM_NAMESPACE_RANK.

Return type:

int

local_rank()[source]

The local rank is read from the environment variable JSM_NAMESPACE_LOCAL_RANK.

Return type:

int

node_rank()[source]

The node rank is determined by the position of the current hostname in the OpenMPI host rank file stored in LSB_DJOB_RANKFILE.

Return type:

int

world_size()[source]

The world size is read from the environment variable JSM_NAMESPACE_SIZE.

Return type:

int

property creates_processes_externally: bool

LSF creates subprocesses, i.e., PyTorch Lightning does not need to spawn them.

property main_address: str

The main address is read from an OpenMPI host rank file in the environment variable LSB_DJOB_RANKFILE.

property main_port: int

The main port is calculated from the LSF job ID.