Lightning AI Studios: Never set up a local environment again →

← Back to blog

How to Contribute to Lit-GPT and Lit-LLaMA

Takeaways

Learn about the variety of ways to contribute to Lit-LLaMA, Lightning AI’s open-source rewrite of LLaMA.

What is Lit-LLaMA 🦙

Lit-LLaMA is an exciting new rewrite of LLaMA that utilizes Lightning Fabric to scale PyTorch code. Its main focus is on improving code readability and optimization for running on consumer GPUs.

One of the highlights of Lit-LLaMA is that it is released under the Apache 2.0 license, which means it is more accessible to other deep learning projects that use similar licenses. Additionally, this license enables commercial use, making it an ideal choice for businesses looking to incorporate deep learning technology into their products.

 

The Value of Contributing

Contributing to impactful open-source projects brings about significant personal benefits that cannot be overstated.

Aside from driving progress in the world of artificial intelligence, you’ll have the opportunity to build new skills or enhance existing ones. By collaborating with a diverse community of exceptional individuals from different corners of the world, you’ll learn from unique experiences and perspectives.

Thanks to the ease of collaboration across borders and cultures, we can coordinate efforts like never before. Open-source contributing is critical to effectively manage our resources as we train energy- and cost-intensive models.

Building and Maintaining Skills

The term “skills” is quite broad, as each reader may come from different backgrounds and have various experiences. You may be an AI researcher exploring new activation methods or optimizers with a deep understanding of low-level primitives. Or, you might be a hard science practitioner comparing “classic” methods to those of neural nets. Perhaps you’re an expert in MATLAB or Julia, and you’re looking for a credible open-source project to assess what great Python frameworks look like. Or, you may simply be a “hobbyist” with a passion for applied learning and excelling at anything you put your mind to.

As you can see, the reasons for being here are diverse, but open-source contributing offers avenues for improvement and learning for everyone. Even seasoned academics and practitioners can unblock their work and learn something new, like a more efficient method of solving technical issues in their code or a new CI/CD feature.

Those new to artificial intelligence and machine learning can also uncover what interests them most about the field. The possibilities here are as varied as the term “skills.” By contributing to open-source projects and learning in the open, you can identify your interests and receive credible feedback along the way.

Community

Being part of the open-source community is crucial to achieving quality learning outcomes and receiving credible feedback.

In the 2023 learning landscape, community is ubiquitous. Whether it’s GitHub Discussions, StackOverflow, YouTube, or Discord, learning about a topic means engaging with a community, rather than learning in a silo.

As mentioned before, a diverse community is invaluable for open-source contributors. Just as the personal benefits of contributing cannot be overstated, neither can the benefits of connecting with a community that encompasses a variety of skill levels and backgrounds. Contributing in a communal setting provides the opportunity to connect with others who have traveled a similar path and are eager to help you advance.

Interacting with Core Maintainers and Contributors

Contributing to impactful open-source projects not only allows you to interact with a diverse community, but also with the core team behind that particular project. This interaction is equally essential.

Core maintainers and contributors work alongside community contributors who submit issues and pull requests. Core maintainers are typically code owners, while core contributors are individuals who have consistently and significantly contributed over time and have earned this title from the organization.

As you submit your pull request, core maintainers and contributors will guide you through the process by offering feedback and suggestions. Contributing to open-source projects is an excellent way to receive feedback from some of the world’s leading research engineers.

Anyone from the community has the opportunity to become a core contributor. The key is to consistently provide high-quality and dependable contributions. You can check out the guidelines for becoming a core contributor and start your journey today!

Verifiable Work

Submitting code contributions to popular open-source projects as pull requests on GitHub provides a verifiable way to demonstrate your ability to write maintainable code. This is a valuable feature to leverage, as it complements any end-to-end portfolio project very well.

When your code is merged into the project, it shows that it met the submission criteria set by the core maintainers. It also demonstrates that you can collaborate in a distributed team to resolve code issues and are proficient in GitOps.

Driving Progress in Open-Source AI

Open-source AI projects are a transparent way to create and maintain the technology that will have a lasting impact on our world, across generations to come. Progress needs to be accessible to all and made in the open, not in walled gardens by exclusive groups. Contributing to open-source projects means you will have helped to push the world forward, in the direction of progress, no matter how simple or complex your contribution is.

Research and Checkpoint Contributions

The world of open-source artificial intelligence and machine learning is rapidly evolving, and so are the types of contributions needed to keep up with this ever-changing landscape.

Contributing to open-source AI projects still involves hot-fixes for bugs and implementing new features. However, it has also expanded to include sharing reproducible research methods in the form of source code, training checkpoints, and tuning best practices.

In Lit-LLaMA, reproducible research contributions include pre-training and fine-tuning. These contributions enable the community to use Lit-LLaMA in a variety of applications and hardware, including consumer GPUs.

Accelerating LLaMA with Fabric is an excellent article to help you get started with preparing to create a reproducible contribution.

Pre-training

Pre-training means training the model on a (generally self-supervised) task that is designed to make the model more amenable to being fine-tuned on specific tasks like code completion or Q&A. Pre-training will allow the community to leverage the collective mind and efforts of those involved, if the model is released under an adequately permissive license. An end-goal of this contribution effort is to create a truly open-source model that can reliably be used for inference.

train.py has been provided in Lit-LLaMA as a starting point for pre-training.

Our CTO’s blog Why Lit-LLaMA dives into the licensing complexity of LLaMA and why pre-training is an essential part of the march towards open-source LLMs.

Fine Tuning

Fine-tuning involves tuning a pre-trained model on new data for a task that it was not trained on. Currently, fine-tuning Lit-LLaMA requires requesting access to the tokenizer and checkpoint from Meta; and would subject any fine-tuned derivative work from those resources to LLaMA’s licensing. When fully open-sourced weights become available, these restrictions will go away.

Fine-tuning with Lit-LLaMA is enabled by way of LoRA and LLaMA-Adapters. Two scripts, finetune_lora.py, and finetune_adapters.py, have been provided for users to begin instruction-tuning with either method on the Stanford Alpaca dataset.

You can read more on fine-tuning Lit-LLaMA in the post, Understanding Parameter-Efficient Finetuning of Large Language Models.

Preparing to Contribute Reproducible Research

Preparing to contribute reproducible research can begin with forking a repo in the GitHub web app and then cloning your fork from GitHub, or by simply cloning the source repository to your local machine if you plan to use Lit-LLaMA in a more bespoke manner.

Let’s fork the source repo without changing the repository name, and then clone it into a directory named Developer by following the instructions below.

Not changing the name of your fork and using a directory named Developer is a suggestion, not a requirement. If you are using a directory name other than Developer, please be sure to replace Developer with your directory name or full path in the instructions below, and also be sure to replace lit-llama with the name of your fork.

Let’s clone the forked repository into our desired location with:


cd ~/Developer
git clone https://github.com/YOUR_GITHUB_USERNAME/lit-llama.git

Verifying that the clone was successful can be done by checking that lit-llama exists as a directory. If the directory exists, lit-llama will be logged to the terminal. To check, do the following in the terminal:


ls ~/Developer

So long as lit-llama is logged to terminal, you are fine to open the repository in your code editor or IDE and begin working on your applied contribution.

Source Code Contributions

Contributing to the source code can vary from helping to improve docstrings to submitting issues for bugs and resolving those issues with pull requests.

How to Contribute to Lightning covers how to find issues to work on, asking to be assigned to an issue, and foundational concepts of pull requests.

Our Contributing page contains our design principles, additional contribution types, and guidelines on coding style, documentation, and testing. The Contributing page is especially helpful given this document drives best practices across our frameworks and implementations, and will help guide your PR to success.

Reporting Issues, and Submitting Pull Requests

Open-source implementations are improved in quality when contributors provide feedback by raising issues in GitHub. Reporting issues in GitHub is the primary method of beginning a contribution and the flow looks something like this:

  1. encounter a bug or notice an error in documentation
  2. submit the issue in GitHub
  3. volunteer to resolve that issue
  4. be assigned by a core maintainer
  5. fork and clone the repo
  6. make initial changes to the source code on a development branch in your fork
  7. open a PR in the source repository
  8. work with the core maintainers to merge the PR

The next section will help you prepare for steps 5 – 8 of the steps shown above.

Preparing to Contribute to the Source Code

Preparing to contribute to the source code is very similar to preparing to contribute to training or tuning, in that the process starts with forking the repo in the GitHub web app and then cloning your fork from GitHub. An advantage of this method is that it will immediately create a remote repo for you to push your changes to, as we cannot push directly to the source repo.

If necessary, let’s watch Lightning Bit’s Terminal Commands to become familiar with the following terminal commands for creating and navigating working directories. If you watched the video, you will have created a directory named Developer in your root user profile, and are ready to clone the repo.

A directory named Developer is a suggestion, not a requirement. If you are using a directory name other than Developer, please be sure to replace Developer with your directory name or full path in the instructions below.

Let’s clone the repo in terminal with:


cd ~/Developer
git clone https://github.com/YOUR_GITHUB_USERNAME/lit-llama.git
cd lit-llama

Now, let’s check that the fork of Lit-LLaMA is the origin branch with the following command in terminal:


git remote -v

Running the above command should display the following:


origin [email protected]:YOUR_GITHUB_USERNAME/lit-llama.git (fetch)
origin [email protected]:YOUR_GITHUB_USERNAME/lit-llama.git (push)

The final step is to create a development branch for the PR. Let’s do that with:


git branch YOUR_BRANCH_NAME

Awesome! You can now open your favorite code editor and start working after you’ve created your virtual environment with conda or venv.

Need more help with git? There’s a video for that → Version Control Using Git.

Individual Contributors

The types of contributions that individuals can make are determined by the hardware they have access to. While it may not be possible to train production-quality checkpoints from scratch on a laptop, contributions from community members attempting to train and contribute on a variety of systems can help identify edge cases in the code base to report as issues. These contributions are both helpful and meaningful.

Contributions to the source code will mostly depend on how the individual contributor is using Lit-LLaMA or if they are capable of implementing a new feature. If you need an overview of Lit-LLaMA training and fine-tuning, you can start with the Accelerating LLaMA with Fabric post and submit issues in GitHub or ask questions in our community Discord as you make progress.

Join the Community

We have a vibrant and highly skilled international community gathering on GitHub in Discussions and on our Discord. We would love to have you join us as we build and grow together in our quest for truly open-source AI.

We welcome all individual contributors, regardless of their level of experience or hardware. Your contributions are valuable, and we are excited to see what you can accomplish in this collaborative and supportive environment.

Happy building and contributing!