API Reference¶

Fabric¶

Fabric accelerates your PyTorch training or inference code with minimal changes required.

`Accelerator`	The Accelerator base class.
`CPUAccelerator`	Accelerator for CPU devices.
`CUDAAccelerator`	Accelerator for NVIDIA CUDA devices.
`MPSAccelerator`	Accelerator for Metal Apple Silicon GPU devices.
`TPUAccelerator`	Accelerator for TPU devices.

`Logger`	Base class for experiment loggers.
`CSVLogger`	Log to the local file system in CSV format.
`TensorBoardLogger`	Log to local file system in TensorBoard format.

`Precision`	Base class for all plugins handling the precision-specific parts of the training.
`DoublePrecision`	Plugin for training with double (`torch.float64`) precision.
`MixedPrecision`	Plugin for Automatic Mixed Precision (AMP) training with `torch.autocast`.
`TPUPrecision`	Precision plugin for TPU integration.
`TPUBf16Precision`	Plugin that enables bfloats on TPUs.
`FSDPPrecision`	AMP for Fully Sharded Data Parallel training.

`ClusterEnvironment`	Specification of a cluster environment.
`KubeflowEnvironment`	Environment for distributed training using the PyTorchJob operator from Kubeflow
`LightningEnvironment`	The default environment used by Lightning for a single node or free cluster (not managed).
`LSFEnvironment`	An environment for running on clusters managed by the LSF resource manager.
`SLURMEnvironment`	Cluster environment for training on a cluster managed by SLURM.
`TorchElasticEnvironment`	Environment for fault-tolerant and elastic training with torchelastic
`XLAEnvironment`	Cluster environment for training on a TPU Pod with the PyTorch/XLA library.

`CheckpointIO`	Interface to save/load checkpoints as they are saved through the `Strategy`.
`TorchCheckpointIO`	CheckpointIO that utilizes `torch.save()` and `torch.load()` to save and load checkpoints respectively, common for most use cases.
`XLACheckpointIO`	CheckpointIO that utilizes `xm.save()` to save checkpoints for TPU training strategies.

`Collective`	Interface for collective operations.
`TorchCollective`
`SingleDeviceCollective`

`Strategy`	Base class for all strategies that change the behaviour of the training, validation and test- loop.
`DDPStrategy`	Strategy for multi-process single-device training on one or multiple nodes.
`DataParallelStrategy`	Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data.
`FSDPStrategy`	Strategy for Fully Sharded Data Parallel provided by torch.distributed.
`ParallelStrategy`	Strategy for training with multiple processes in parallel.
`SingleDeviceStrategy`	Strategy that handles communication on a single device.
`SingleTPUStrategy`	Strategy for training on a single TPU device.