accelerators¶
The Accelerator base class for Lightning PyTorch.  | 
|
Accelerator for CPU devices.  | 
|
Accelerator for NVIDIA CUDA devices.  | 
|
Accelerator for HPU devices.  | 
|
Accelerator for IPUs.  | 
|
Accelerator for TPU devices.  | 
callbacks¶
Finetune a backbone model based on a learning rate user-defined scheduling.  | 
|
This class implements the base logic for writing your own Finetuning Callback.  | 
|
Base class to implement how the predictions should be stored.  | 
|
The   | 
|
Abstract base class used to build new callbacks.  | 
|
Automatically monitors and logs device stats during training, validation and testing stage.  | 
|
Monitor a metric and stop training when it stops improving.  | 
|
Change gradient accumulation factor according to scheduling.  | 
|
Create a simple callback on the fly using lambda functions.  | 
|
The   | 
|
Automatically monitor and logs learning rate for learning rate schedulers during training.  | 
|
Save the model periodically by monitoring a quantity.  | 
|
Model pruning Callback, using PyTorch's prune utilities.  | 
|
Generates a summary of all layers in a   | 
|
Used to save a checkpoint on exception.  | 
|
The base class for progress bars in Lightning.  | 
|
Generates a summary of all layers in a   | 
|
Create a progress bar with rich text formatting.  | 
|
Implements the Stochastic Weight Averaging (SWA) Callback to average a model.  | 
|
The Timer callback tracks the time spent in the training, validation, and test loops and interrupts the Trainer if the given time limit for the training loop is reached.  | 
|
This is the default progress bar used by Lightning.  | 
cli¶
Implementation of a configurable command line tool for pytorch-lightning.  | 
|
Extension of jsonargparse's ArgumentParser for pytorch-lightning.  | 
|
Saves a LightningCLI config to the log_dir when training starts.  | 
core¶
Hooks to be used with Checkpointing.  | 
|
Hooks to be used for data related stuff.  | 
|
Hooks to be used in LightningModule.  | 
|
A DataModule standardizes the training, val, test splits, data preparation and transforms.  | 
|
This class is used to wrap the user optimizers and handle properly the backward and optimizer_step logic across accelerators, AMP, accumulate_grad_batches.  | 
loggers¶
Abstract base class used to build new loggers.  | 
|
Comet Logger  | 
|
CSV logger  | 
|
MLflow Logger  | 
|
Neptune Logger  | 
|
TensorBoard Logger  | 
|
Weights and Biases Logger  | 
plugins¶
precision¶
Precision plugin for DeepSpeed integration.  | 
|
Plugin for training with double (  | 
|
AMP for Fully Sharded Data Parallel (FSDP) Training.  | 
|
Plugin that enables bfloat/half support on HPUs.  | 
|
Precision plugin for IPU integration.  | 
|
Plugin for Automatic Mixed Precision (AMP) training with   | 
|
Base class for all plugins handling the precision-specific parts of the training.  | 
|
Plugin that enables bfloats on TPUs.  | 
|
Precision plugin for TPU integration.  | 
environments¶
Specification of a cluster environment.  | 
|
Environment for distributed training using the PyTorchJob operator from Kubeflow  | 
|
The default environment used by Lightning for a single node or free cluster (not managed).  | 
|
An environment for running on clusters managed by the LSF resource manager.  | 
|
An environment for running on clusters with processes created through MPI.  | 
|
Cluster environment for training on a cluster managed by SLURM.  | 
|
Environment for fault-tolerant and elastic training with torchelastic  | 
|
Cluster environment for training on a TPU Pod with the PyTorch/XLA library.  | 
io¶
  | 
|
Interface to save/load checkpoints as they are saved through the   | 
|
CheckpointIO to save checkpoints for HPU training strategies.  | 
|
CheckpointIO that utilizes   | 
|
CheckpointIO that utilizes   | 
others¶
Abstract base class for creating plugins that wrap layers of a model with synchronization logic for multiprocessing.  | 
|
A plugin that wraps all batch normalization layers of a model with synchronization logic for multiprocessing.  | 
profiler¶
This profiler uses Python's cProfiler to record more detailed information about time spent in each function call recorded during a given action.  | 
|
This class should be used when you don't want the (small) overhead of profiling.  | 
|
If you wish to write a custom profiler, you should inherit from this class.  | 
|
This profiler uses PyTorch's Autograd Profiler and lets you inspect the cost of.  | 
|
This profiler simply records the duration of actions (in seconds) and reports the mean duration of each action and the total time spent over the entire training run.  | 
|
XLA Profiler will help you debug and optimize training workload performance for your models using Cloud TPU performance tools.  | 
strategies¶
Strategy for multi-process single-device training on one or multiple nodes.  | 
|
Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models.  | 
|
Strategy for Fully Sharded Data Parallel provided by torch.distributed.  | 
|
Strategy for distributed training on multiple HPU devices.  | 
|
Plugin for training on IPU devices.  | 
|
Plugin for training with multiple processes in parallel.  | 
|
Strategy that handles communication on a single device.  | 
|
Strategy for training on single HPU device.  | 
|
Strategy for training on a single TPU device.  | 
|
Base class for all strategies that change the behaviour of the training, validation and test- loop.  | 
|
Strategy for training multiple TPU devices using the   | 
utilities¶
Utilities that can be used with Deepspeed.  | 
|
Utilities related to memory.  | 
|
Utilities used for parameter parsing.  | 
|
Utilities that can be used for calling functions on a particular rank.  | 
|
Utilities to help with reproducibility of models.  | 
|
Warning-related utilities.  |