API References¶
Accelerator API¶
The Accelerator Base Class. |
|
Accelerator for CPU devices. |
|
Accelerator for IPUs. |
|
Accelerator for GPU devices. |
|
Accelerator for TPU devices. |
Core API¶
LightningDataModule for loading DataLoaders with ease. |
|
Various hooks to be used in the Lightning code. |
|
The LightningModule - an nn.Module with many additional features. |
Callbacks API¶
Abstract base class used to build new callbacks. |
|
Early Stopping |
|
GPU Stats Monitor |
|
Gradient Accumulator |
|
Learning Rate Monitor |
|
Model Checkpointing |
|
Progress Bars |
Loggers API¶
Abstract base class used to build new loggers. |
|
Comet Logger |
|
CSV logger |
|
MLflow Logger |
|
Neptune Logger |
|
TensorBoard Logger |
|
Test Tube Logger |
|
Weights and Biases Logger |
Loop API¶
Base Classes¶
Basic Loops interface. |
|
Base class to loop over all dataloaders. |
Default Loop Implementations¶
Training¶
This Loop iterates over the epochs to run the training. |
|
Runs over all batches in a dataloader (one epoch). |
|
Runs over a single batch of data. |
|
Runs over a sequence of optimizers. |
|
A special loop implementing what is known in Lightning as Manual Optimization where the optimization happens entirely in the |
Validation and Testing¶
Loops over all dataloaders for evaluation. |
|
This is the loop performing the evaluation. |
Prediction¶
Loop to run over dataloaders for prediction. |
|
Loop performing prediction on arbitrary sequentially used dataloaders. |
Plugins API¶
Training Type Plugins¶
Base class for all training type plugins that change the behaviour of the training, validation and test- loop. |
|
Plugin that handles communication on a single device. |
|
Plugin for training with multiple processes in parallel. |
|
Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data. |
|
Plugin for multi-process single-device training on one or multiple nodes. |
|
DDP2 behaves like DP in one node, but synchronization across nodes behaves like in DDP. |
|
Optimizer and gradient sharded training provided by FairScale. |
|
Optimizer sharded training provided by FairScale. |
|
Spawns processes using the |
|
Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. |
|
Plugin for Horovod distributed training integration. |
|
Plugin for training on a single TPU device. |
|
Plugin for training multiple TPU devices using the |
Precision Plugins¶
Base class for all plugins handling the precision-specific parts of the training. |
|
Base Class for mixed precision. |
|
Plugin for Native Mixed Precision (AMP) training with |
|
Native AMP for Sharded Training. |
|
Mixed Precision Plugin based on Nvidia/Apex (https://github.com/NVIDIA/apex) |
|
Precision plugin for DeepSpeed integration. |
|
Plugin that enables bfloats on TPUs. |
|
Plugin for training with double ( |
|
Native AMP for Fully Sharded Training. |
|
Cluster Environments¶
Specification of a cluster environment. |
|
The default environment used by Lightning for a single node or free cluster (not managed). |
|
An environment for running on clusters managed by the LSF resource manager. |
|
Environment for fault-tolerant and elastic training with torchelastic |
|
Environment for distributed training using the PyTorchJob operator from Kubeflow |
|
Cluster environment for training on a cluster managed by SLURM. |
Checkpoint IO Plugins¶
Interface to save/load checkpoints as they are saved through the |
|
CheckpointIO that utilizes |
|
CheckpointIO that utilizes |
Profiler API¶
Specification of a profiler. |
|
This profiler uses Python’s cProfiler to record more detailed information about time spent in each function call recorded during a given action. |
|
If you wish to write a custom profiler, you should inherit from this class. |
|
This class should be used when you don’t want the (small) overhead of profiling. |
|
This profiler uses PyTorch’s Autograd Profiler and lets you inspect the cost of. |
|
This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run. |
|
This Profiler will help you debug and optimize training workload performance for your models using Cloud TPU performance tools. |
Trainer API¶
Trainer to automate the training. |
LightningLite API¶
Lite accelerates your PyTorch training or inference code with minimal changes required. |
Tuner API¶
Tuner class to tune your model. |
Utilities API¶
Helper functions to help with reproducibility of models. |