Shortcuts

API Reference

Fabric

Fabric

Fabric accelerates your PyTorch training or inference code with minimal changes required.

Accelerators

Accelerator

The Accelerator base class.

CPUAccelerator

Accelerator for CPU devices.

CUDAAccelerator

Accelerator for NVIDIA CUDA devices.

MPSAccelerator

Accelerator for Metal Apple Silicon GPU devices.

TPUAccelerator

Accelerator for TPU devices.

Loggers

Logger

Base class for experiment loggers.

CSVLogger

Log to the local file system in CSV format.

TensorBoardLogger

Log to local file system in TensorBoard format.

Plugins

Precision

Precision

Base class for all plugins handling the precision-specific parts of the training.

DoublePrecision

Plugin for training with double (torch.float64) precision.

MixedPrecision

Plugin for Automatic Mixed Precision (AMP) training with torch.autocast.

TPUPrecision

Precision plugin for TPU integration.

TPUBf16Precision

Plugin that enables bfloats on TPUs.

FSDPPrecision

AMP for Fully Sharded Data Parallel training.

Environments

ClusterEnvironment

Specification of a cluster environment.

KubeflowEnvironment

Environment for distributed training using the PyTorchJob operator from Kubeflow

LightningEnvironment

The default environment used by Lightning for a single node or free cluster (not managed).

LSFEnvironment

An environment for running on clusters managed by the LSF resource manager.

SLURMEnvironment

Cluster environment for training on a cluster managed by SLURM.

TorchElasticEnvironment

Environment for fault-tolerant and elastic training with torchelastic

XLAEnvironment

Cluster environment for training on a TPU Pod with the PyTorch/XLA library.

IO

CheckpointIO

Interface to save/load checkpoints as they are saved through the Strategy.

TorchCheckpointIO

CheckpointIO that utilizes torch.save() and torch.load() to save and load checkpoints respectively, common for most use cases.

XLACheckpointIO

CheckpointIO that utilizes xm.save() to save checkpoints for TPU training strategies.

Collectives

Collective

Interface for collective operations.

TorchCollective

SingleDeviceCollective

Strategies

Strategy

Base class for all strategies that change the behaviour of the training, validation and test- loop.

DDPStrategy

Strategy for multi-process single-device training on one or multiple nodes.

DataParallelStrategy

Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data.

FSDPStrategy

Strategy for Fully Sharded Data Parallel provided by torch.distributed.

ParallelStrategy

Strategy for training with multiple processes in parallel.

SingleDeviceStrategy

Strategy that handles communication on a single device.

SingleTPUStrategy

Strategy for training on a single TPU device.


© Copyright Copyright (c) 2018-2023, Lightning AI et al...

Built with Sphinx using a theme provided by Read the Docs.