accelerators¶
| The Accelerator base class for Lightning PyTorch. | |
| Accelerator for CPU devices. | |
| Accelerator for NVIDIA CUDA devices. | |
| Accelerator for HPU devices. | |
| Accelerator for IPUs. | |
| Accelerator for TPU devices. | 
callbacks¶
| Finetune a backbone model based on a learning rate user-defined scheduling. | |
| This class implements the base logic for writing your own Finetuning Callback. | |
| Base class to implement how the predictions should be stored. | |
| The  | |
| Abstract base class used to build new callbacks. | |
| Automatically monitors and logs device stats during training, validation and testing stage. | |
| Monitor a metric and stop training when it stops improving. | |
| Change gradient accumulation factor according to scheduling. | |
| Create a simple callback on the fly using lambda functions. | |
| The  | |
| Automatically monitor and logs learning rate for learning rate schedulers during training. | |
| Save the model periodically by monitoring a quantity. | |
| Model pruning Callback, using PyTorch's prune utilities. | |
| Generates a summary of all layers in a  | |
| Used to save a checkpoint on exception. | |
| The base class for progress bars in Lightning. | |
| Generates a summary of all layers in a  | |
| Create a progress bar with rich text formatting. | |
| Implements the Stochastic Weight Averaging (SWA) Callback to average a model. | |
| The Timer callback tracks the time spent in the training, validation, and test loops and interrupts the Trainer if the given time limit for the training loop is reached. | |
| This is the default progress bar used by Lightning. | 
cli¶
| Implementation of a configurable command line tool for pytorch-lightning. | |
| Extension of jsonargparse's ArgumentParser for pytorch-lightning. | |
| Saves a LightningCLI config to the log_dir when training starts. | 
core¶
| Hooks to be used with Checkpointing. | |
| Hooks to be used for data related stuff. | |
| Hooks to be used in LightningModule. | |
| A DataModule standardizes the training, val, test splits, data preparation and transforms. | |
| This class is used to wrap the user optimizers and handle properly the backward and optimizer_step logic across accelerators, AMP, accumulate_grad_batches. | 
loggers¶
| Abstract base class used to build new loggers. | |
| Comet Logger | |
| CSV logger | |
| MLflow Logger | |
| Neptune Logger | |
| TensorBoard Logger | |
| Weights and Biases Logger | 
plugins¶
precision¶
| Precision plugin for DeepSpeed integration. | |
| Plugin for training with double ( | |
| AMP for Fully Sharded Data Parallel (FSDP) Training. | |
| Plugin that enables bfloat/half support on HPUs. | |
| Precision plugin for IPU integration. | |
| Plugin for Automatic Mixed Precision (AMP) training with  | |
| Base class for all plugins handling the precision-specific parts of the training. | |
| Plugin that enables bfloats on TPUs. | |
| Precision plugin for TPU integration. | 
environments¶
| Specification of a cluster environment. | |
| Environment for distributed training using the PyTorchJob operator from Kubeflow | |
| The default environment used by Lightning for a single node or free cluster (not managed). | |
| An environment for running on clusters managed by the LSF resource manager. | |
| An environment for running on clusters with processes created through MPI. | |
| Cluster environment for training on a cluster managed by SLURM. | |
| Environment for fault-tolerant and elastic training with torchelastic | |
| Cluster environment for training on a TPU Pod with the PyTorch/XLA library. | 
io¶
| 
 | |
| Interface to save/load checkpoints as they are saved through the  | |
| CheckpointIO to save checkpoints for HPU training strategies. | |
| CheckpointIO that utilizes  | |
| CheckpointIO that utilizes  | 
others¶
| Abstract base class for creating plugins that wrap layers of a model with synchronization logic for multiprocessing. | |
| A plugin that wraps all batch normalization layers of a model with synchronization logic for multiprocessing. | 
profiler¶
| This profiler uses Python's cProfiler to record more detailed information about time spent in each function call recorded during a given action. | |
| This class should be used when you don't want the (small) overhead of profiling. | |
| If you wish to write a custom profiler, you should inherit from this class. | |
| This profiler uses PyTorch's Autograd Profiler and lets you inspect the cost of. | |
| This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run. | |
| XLA Profiler will help you debug and optimize training workload performance for your models using Cloud TPU performance tools. | 
strategies¶
| Strategy for multi-process single-device training on one or multiple nodes. | |
| Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. | |
| Strategy for Fully Sharded Data Parallel provided by torch.distributed. | |
| Strategy for distributed training on multiple HPU devices. | |
| Plugin for training on IPU devices. | |
| Plugin for training with multiple processes in parallel. | |
| Strategy that handles communication on a single device. | |
| Strategy for training on single HPU device. | |
| Strategy for training on a single TPU device. | |
| Base class for all strategies that change the behaviour of the training, validation and test- loop. | |
| Strategy for training multiple TPU devices using the  | 
utilities¶
| Utilities that can be used with Deepspeed. | |
| Utilities that can be used with distributed training. | |
| Utilities related to memory. | |
| Utilities used for parameter parsing. | |
| Utilities that can be used for calling functions on a particular rank. | |
| Utilities to help with reproducibility of models. | |
| Warning-related utilities. |