Shortcuts

Fabric (Beta)

Fabric allows you to scale any PyTorch model with just a few lines of code! With Fabric you can easily scale your model to run on distributed devices using the strategy of your choice, while keeping full control over the training loop and optimization logic.

With only a few changes to your code, Fabric allows you to:

  • Automatic placement of models and data onto the device

  • Automatic support for mixed precision (speedup and smaller memory footprint)

  • Seamless switching between hardware (CPU, GPU, TPU)

  • State-of-the-art distributed training strategies (DDP, FSDP, DeepSpeed)

  • Easy-to-use launch command for spawning processes (DDP, torchelastic, etc)

  • Multi-node support (TorchElastic, SLURM, and more)

  • You keep full control of your training loop

  import torch
  import torch.nn as nn
  from torch.utils.data import DataLoader, Dataset

+ from lightning.fabric import Fabric

  class PyTorchModel(nn.Module):
      ...

  class PyTorchDataset(Dataset):
      ...

+ fabric = Fabric(accelerator="cuda", devices=8, strategy="ddp")
+ fabric.launch()

- device = "cuda" if torch.cuda.is_available() else "cpu
  model = PyTorchModel(...)
  optimizer = torch.optim.SGD(model.parameters())
+ model, optimizer = fabric.setup(model, optimizer)
  dataloader = DataLoader(PyTorchDataset(...), ...)
+ dataloader = fabric.setup_dataloaders(dataloader)
  model.train()

  for epoch in range(num_epochs):
      for batch in dataloader:
          input, target = batch
-         input, target = input.to(device), target.to(device)
          optimizer.zero_grad()
          output = model(input)
          loss = loss_fn(output, target)
-         loss.backward()
+         fabric.backward(loss)
          optimizer.step()

Note

Fabric is currently in Beta. Its API is subject to change based on feedback.


Fundamentals


Build Your Own Trainer


Advanced Topics


Examples


API