Use efficient gradient accumulation
Learn how to perform efficient gradient accumulation in distributed settings
Learn all about communication primitives for distributed operation. Gather, reduce, broadcast, etc.
Use multiple models and optimizers
See how flexible Fabric is to work with multiple models and optimizers!
Train models with billions of parameters
Train the largest models with FSDP across multiple GPUs and machines