Glossary¶

Accelerators

Accelerators connect the Trainer to hardware to train faster

Callback

Add self-contained extra functionality during training execution

Checkpointing

Save and load progress with checkpoints

Cluster

Run on your own group of servers

Cloud checkpoint

Save your models to cloud filesystems

Console Logging

Capture more visible logs

Debugging

Fix errors in your code

DeepSpeed

Distribute models with billions of parameters across hundreds GPUs

Early stopping

Stop the training when no improvement is observed

Experiment manager (Logger)

Tools for tracking and visualizing artifacts and logs

Finetuning

Technique for training pretrained models

FSDP

Distribute models with billions of parameters across hundreds GPUs

GPU

Graphics Processing Unit for faster training

Half precision

Using different numerical formats to save memory and run faster

HPU

Habana Gaudi AI Processor Unit for faster training

Inference

Making predictions by applying a trained model to unlabeled examples

IPU

Graphcore Intelligence Processing Unit for faster training

Lightning CLI

A Command-line Interface (CLI) to interact with Lightning code via a terminal

LightningDataModule

A shareable, reusable class that encapsulates all the steps needed to process data

LightningModule

A base class organizug your neural network module

Log

Outputs or results used for visualization and tracking

Metrics

A statistic used to measure performance or other objectives we want to optimize

Model

The set of parameters and structure for a system to make predictions

Model Parallelism

A way to scale training that splits a model between multiple devices.

Plugins

Custom trainer integrations such as custom precision, checkpointing or cluster environment implementation

Progress bar

Output printed to the terminal to visualize the progression of training

Production

Using ML models in real world systems

Prediction

Computing a model's output

Pretrained models

Models that have already been trained for a particular task

Profiler

Tool to identify bottlenecks and performance of different parts of a model

Pruning

A technique to eliminae some of the model weights to reduce the model size and decrease inference requirements

Quantization

A technique to accelerate the model inference speed and decrease the memory load while still maintaining the model accuracy

Remote filesystem and FSSPEC

Accessing files from cloud storage providers

Strategy

Ways the trainer controls the model distribution across training, evaluation, and prediction

Strategy registry

A class that holds information about training strategies and allows adding new custom strategies

Style guide

Best practices to improve readability and reproducibility

SWA

Stochastic Weight Averaging (SWA) can make your models generalize better

SLURM

Simple Linux Utility for Resource Management, or simply Slurm, is a free and open-source job scheduler for Linux clusters

Transfer learning

Using pre-trained models to improve learning

Trainer

The class that automates and customizes model training

Torch distributed

Setup for running on distributed environments

Warnings

Disable false-positive warnings emitted by Lightning