lightning.fabric.plugins.environments.lsf.LSFEnvironment¶
- class lightning.fabric.plugins.environments.lsf.LSFEnvironment[source]¶
Bases:
ClusterEnvironment
An environment for running on clusters managed by the LSF resource manager.
It is expected that any execution using this ClusterEnvironment was executed using the Job Step Manager i.e.
jsrun
.This plugin expects the following environment variables:
LSB_JOBID
The LSF assigned job ID
LSB_DJOB_RANKFILE
The OpenMPI compatible rank file for the LSF job
JSM_NAMESPACE_LOCAL_RANK
The node local rank for the task. This environment variable is set by
jsrun
JSM_NAMESPACE_SIZE
The world size for the task. This environment variable is set by
jsrun
JSM_NAMESPACE_RANK
The global rank for the task. This environment variable is set by
jsrun
- _get_main_address()[source]¶
A helper for getting the main address.
The main address is assigned to the first node in the list of nodes used for the job.
- Return type:
- static _get_main_port()[source]¶
A helper function for accessing the main port.
Uses the LSF job ID so all ranks can compute the main port.
- Return type:
- _get_node_rank()[source]¶
A helper method for getting the node rank.
The node rank is determined by the position of the current node in the list of hosts used in the job. This is calculated by reading all hosts from
LSB_DJOB_RANKFILE
and finding this node’s hostname in the list.- Return type:
- static _read_hosts()[source]¶
Read compute hosts that are a part of the compute job.
LSF uses the Job Step Manager (JSM) to manage job steps. Job steps are executed by the JSM from “launch” nodes. Each job is assigned a launch node. This launch node will be the first node in the list contained in
LSB_DJOB_RANKFILE
.
- static detect()[source]¶
Returns
True
if the current process was launched using thejsrun
command.- Return type:
- global_rank()[source]¶
The world size is read from the environment variable
JSM_NAMESPACE_RANK
.- Return type:
- local_rank()[source]¶
The local rank is read from the environment variable JSM_NAMESPACE_LOCAL_RANK.
- Return type:
- node_rank()[source]¶
The node rank is determined by the position of the current hostname in the OpenMPI host rank file stored in
LSB_DJOB_RANKFILE
.- Return type:
- world_size()[source]¶
The world size is read from the environment variable
JSM_NAMESPACE_SIZE
.- Return type:
- property creates_processes_externally: bool¶
LSF creates subprocesses, i.e., PyTorch Lightning does not need to spawn them.