Shortcuts

KubeflowEnvironment

class lightning.pytorch.plugins.environments.KubeflowEnvironment[source]

Bases: ClusterEnvironment

Environment for distributed training using the PyTorchJob operator from Kubeflow.

This environment, unlike others, does not get auto-detected and needs to be passed to the Fabric/Trainer constructor manually.

static detect()[source]

Detects the environment settings corresponding to this cluster and returns True if they match.

Return type:

bool

global_rank()[source]

The rank (index) of the currently running process across all nodes and devices.

Return type:

int

local_rank()[source]

The rank (index) of the currently running process inside of the current node.

Return type:

int

node_rank()[source]

The rank (index) of the node on which the current process runs.

Return type:

int

world_size()[source]

The number of processes across all devices and nodes.

Return type:

int

property creates_processes_externally: bool

Whether the environment creates the subprocesses or not.

property main_address: str

The main address through which all processes connect and communicate.

property main_port: int

An open and configured port in the main node through which all processes communicate.

You are viewing an outdated version of PyTorch Lightning Docs

Click here to view the latest version→