Dear Fabric Team,
Gald to see a more expert control of pytorch code offered by Fabric. Considering that Fabric is designed with multi-billion parameter models in mind, i am wondering what is the difference between Fabric and Lightning in terms of LLM ? Thanks so much
Hello @Hannibal046
There isn’t really a difference in what size of models you can train. Both are fantastic both for small and large models. Both support strategies like FSDP and DeepSpeed and with them you are already well equipped. The saying “designed with multi-billion parameter models in mind” in my view is more an assurance that with Fabric you can do that stuff too. An example, we are working on Lit-LLaMA with Fabric for training and finetuning: GitHub - Lightning-AI/lit-llama: Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
That being said, there are some rough edges and we are working on improving the usability. These two are top of our list:
- Easier way to instantiate large models without the need to rely on third party tools
- Implement checkpointing and loading for FSDP and avoid boilerplate code.
Specifically for LLM training, the training loop is very standard and so there is usually no extra control needed on that part. Rule of thumb: Choose Fabric if you want to build your own trainer or stay closer to PyTorch for more control, choose Lightning Trainer if you want to iterate quick without writing much code and stay organized easier.