Fabric (Beta)¶

Fabric allows you to scale any PyTorch model with just a few lines of code! With Fabric you can easily scale your model to run on distributed devices using the strategy of your choice, while keeping full control over the training loop and optimization logic.

With only a few changes to your code, Fabric allows you to:

Automatic placement of models and data onto the device
Automatic support for mixed precision (speedup and smaller memory footprint)
Seamless switching between hardware (CPU, GPU, TPU)
State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed)
Easy-to-use launch command for spawning processes (DDP, torchelastic, etc)
Multi-node support (TorchElastic, SLURM, and more)
You keep full control of your training loop

  import torch
  import torch.nn as nn
  from torch.utils.data import DataLoader, Dataset

+ from lightning.fabric import Fabric

  class PyTorchModel(nn.Module):
      ...

  class PyTorchDataset(Dataset):
      ...

+ fabric = Fabric(accelerator="cuda", devices=8, strategy="ddp")
+ fabric.launch()

- device = "cuda" if torch.cuda.is_available() else "cpu
  model = PyTorchModel(...)
  optimizer = torch.optim.SGD(model.parameters())
+ model, optimizer = fabric.setup(model, optimizer)
  dataloader = DataLoader(PyTorchDataset(...), ...)
+ dataloader = fabric.setup_dataloaders(dataloader)
  model.train()

  for epoch in range(num_epochs):
      for batch in dataloader:
          input, target = batch
-         input, target = input.to(device), target.to(device)
          optimizer.zero_grad()
          output = model(input)
          loss = loss_fn(output, target)
-         loss.backward()
+         fabric.backward(loss)
          optimizer.step()

Note

Fabric is currently in Beta. Its API is subject to change based on feedback.

Fundamentals¶

Getting Started

Learn how to add Fabric to your PyTorch code

basic

Accelerators

Take advantage of your hardware with a switch of a flag

intermediate

Code Structure

Best practices for setting up your training script with Fabric

basic

Distributed Operation

Launch a Python script on multiple devices and machines

intermediate

Fabric in Notebooks

Launch on multiple devices from within a Jupyter notebook

basic

Mixed Precision Training

Save memory and speed up training using mixed precision

intermediate

Build Your Own Trainer¶

The LightningModule

Organize your code in a LightningModule and use it with Fabric

basic

Callbacks

Make use of the Callback system in Fabric

basic

Logging

Learn how Fabric helps you remove boilerplate code for tracking metrics with a logger

basic

Trainer Template

Take our Fabric Trainer template and customize it for your needs

intermediate

Advanced Topics¶

Efficient Gradient Accumulation

Learn how to perform efficient gradient accumulation in distributed settings

advanced

Collectives

Learn all about communication primitives for distributed operation. Gather, reduce, broadcast, etc.

advanced

Examples¶

Image Classification

Train an image classifier on the MNIST dataset

basic

GAN

Train a GAN that generates realistic human faces

intermediate

Meta-Learning

Distributed training with the MAML algorithm on the Omniglot and MiniImagenet datasets

intermediate

Reinforcement Learning

Coming soon

Active Learning

Coming soon

API¶

Fabric Arguments

All configuration options for the Fabric object

basic

Fabric Methods

Explore all methods that Fabric offers

basic

Utilities

Explore utility functions that make your life easier

basic

Full API Reference

Reference of all public classes, methods and functions. Useful for developers.

intermediate