{"id":5647703,"date":"2023-04-06T12:45:16","date_gmt":"2023-04-06T16:45:16","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5647703"},"modified":"2023-06-22T13:30:45","modified_gmt":"2023-06-22T17:30:45","slug":"accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/","title":{"rendered":"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA"},"content":{"rendered":"<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Takeaways<\/h3> In this tutorial, we will learn how to train and fine-tune LLaMA (Large Language Model Meta AI). Lit-LLaMA, a rewrite of LLaMA, can run inference on an 8 GB consumer GPU. We will also discover how it utilizes Lightning Fabric to accelerate the PyTorch code. <\/div>\n<h2>What is LLaMA \ud83e\udd99<\/h2>\n<p>LLaMA is a foundational large language model that has been released by Meta AI.<\/p>\n<p>LLaMA comes in four size variants: 7B, 13B, 33B, and 65B parameters. The paper shows that training smaller foundation models on large enough tokens is desirable, as it requires less computing power and resources. The 65B parameter models have been trained on 1.4 trillion tokens, while the LLaMA 7B model has been trained on 1 trillion tokens.<\/p>\n<p>Just a few weeks after the release of LLaMA, the open-source community embraced it by creating an optimized version and expanding its use cases. Now, you can fine-tune LLaMA using LoRA (reduces the number of trainable parameters for fine-tuning) and train a chatbot with <a href=\"https:\/\/crfm.stanford.edu\/2023\/03\/13\/alpaca.html\">Stanford Alpaca<\/a>.<\/p>\n<p>Lightning AI has also joined the trend by providing an open-source, from-scratch rewrite of LLaMA called <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\">Lit-LLaMA<\/a>. The main highlight of Lit-LLaMA is that it is released under the Apache 2.0 license, which makes it easier to adopt for other deep learning projects that use similar permissive licenses and also enables commercial use. It has scripts for optimized training and fine-tuning with <a href=\"https:\/\/github.com\/microsoft\/LoRA\">LoRA<\/a>.<\/p>\n<h2>Lit-LLaMA: simple, optimized, and completely open-source <span role=\"img\" aria-label=\"\ud83d\udd25\">\ud83d\udd25<\/span><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-5647704 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/Untitled-4.png\" alt=\"\" width=\"258\" height=\"295\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Lit-LLaMA is a scratch rewrite of LLaMA that uses <a href=\"https:\/\/lightning.ai\/pages\/blog\/accelerate-pytorch-code-with-fabric\/\">Lightning Fabric<\/a> for scaling PyTorch code. It focuses on code readability and optimizations to run on consumer GPUs. As of the time of writing this article, you can run Lit-LLaMA on GPUs with 8 GB of memory \ud83e\udd2f.<\/p>\n<p><em>Note: Currently you need to download the official Meta AI LLaMA pre-trained model weights for fine-tuning or running inference.<\/em><\/p>\n<p>Lit-LLaMA supports training, fine-tuning, and generating inference. Let&#8217;s discuss each functionality in detail.<\/p>\n<h3>Training LLaMA<\/h3>\n<p><em>Note: We won\u2019t go into too many details about training LLaMA from scratch and instead focus more on fine-tuning and inference because the computational need for training is not available to everyone in the community.<\/em><\/p>\n<p>The repo comes with a simple and readable <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/lit_llama\/model.py\">LLaMA model implementation<\/a> and a <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/train.py\">training script<\/a> accelerated by Fabric.<\/p>\n<p>Large language models may not fit into a single GPU. Fully Sharded Data Parallelism (FSDP) is a technique that shards model parameters, gradients, and optimizer states across data parallel workers. Fabric provides a unified API that makes it easy to use FSDP.<\/p>\n<p>To use FSDP (Fully-Sharded Data Parallel) with Fabric, create an <code>FSDPStrategy<\/code> object by specifying the <a href=\"https:\/\/pytorch.org\/blog\/introducing-pytorch-fully-sharded-data-parallel-api\/#using-fsdp-in-pytorch\">auto-wrap policy<\/a> and passing it as an argument to the <code>Fabric<\/code> class.<\/p>\n<p>Fabric helps to automatically place the model and tensors on the correct devices, enabling distributed training, mixed precision, and the ability to select the number of devices to train on.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\n<p class=\"p2\">import lightning as L\n\n<p class=\"p2\">from lightning.fabric.strategies import FSDPStrategy\n\n<p class=\"p2\">import torch\n\n<p class=\"p2\">from torch.distributed.fsdp.wrap import transformer_auto_wrap_policy\n\nfrom lit_llama.model import Block, LLaMA, LLaMAConfig\n\n<p class=\"p2\">def main():\n\n<p class=\"p2\">\u00a0 \u00a0 # \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f Initialize FSDP strategy \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>auto_wrap_policy = partial(transformer_auto_wrap_policy, transformer_layer_cls={Block})\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>strategy = FSDPStrategy(auto_wrap_policy=auto_wrap_policy, activation_checkpointing=Block)\n\n<p class=\"p2\">\u00a0 \u00a0 # \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f Initialize Fabric \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\n\n<p class=\"p2\">\u00a0 \u00a0 # setting for 4 GPUs with bf16 mixed precision and FSDP distributed training strategy\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>fabric = L.Fabric(accelerator=\"cuda\", devices=4, precision=\"bf16-mixed\", strategy=strategy)\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>fabric.launch()\n\n<p class=\"p2\">\u00a0 \u00a0 # Load data\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>train_data, val_data = load_datasets()\n\n<p class=\"p2\">\u00a0 \u00a0 # Load model configs\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>config = LLaMAConfig.from_name(\"7B\")\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>config.block_size = block_size\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>config.vocab_size = 100<span class=\"Apple-converted-space\">\u00a0 <\/span># from prepare_shakespeare.py\n\n<p class=\"p2\">\u00a0 \u00a0 # \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f initialize model \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>with fabric.device:\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 \u00a0 \u00a0 <\/span>model = LLaMA(config)\n\n<p class=\"p2\">\u00a0 \u00a0 # \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f Setup model and optimizer for distributed training \u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\u26a1\ufe0f\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>model = fabric.setup_module(model)\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=weight_decay, betas=(beta1, beta2))\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>optimizer = fabric.setup_optimizers(optimizer)\n\n<p class=\"p2\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>train(fabric, model, optimizer, train_data, val_data)\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>You can also find the full training code <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/dda6f800a7efd162ed034cdb451d1670344f5713\/train.py#L43\">here<\/a>.<\/p>\n<h3>Fine-tuning LLaMA<\/h3>\n<p>Within just weeks of launching LLaMA, the community began optimizing and building upon it. Fine-tuning LLaMA on consumer GPUs is crucial to truly democratize LLMs. LLMs can be fine-tuned to build a chatbot and specialized for particular tasks or fields, such as an LLM specialized in summarizing legal or financial data.<\/p>\n<p>Lit-LLaMA includes a <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/finetune_lora.py\">fine-tuning script<\/a> that utilizes LoRA (Low-Rank Adaptation of Large Language Models). LoRA freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture. This significantly reduces the number of trainable parameters for downstream tasks.<\/p>\n<p>To initialize LLaMA with LoRA layers we need to use the context manager from <code>lit_llama.lora<\/code>:<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\n<p class=\"p1\">from lit_llama.lora import lora\n\n<p class=\"p1\">from lit_llama.model import LLaMA, LLaMAConfig\n\n<p class=\"p1\"># initialize configs\n\n<p class=\"p1\">lora_dropout = 0.05\n\n<p class=\"p1\">config = LLaMAConfig.from_name(\"7B\")\n\n<p class=\"p1\">config.block_size = block_size\n\n<p class=\"p1\"># initlize model with LoRA\n\n<p class=\"p1\">with lora(r=lora_r, alpha=lora_alpha, dropout=lora_dropout, enabled=True):\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 \u00a0 \u00a0 <\/span>model = LLaMA(config)\n\n<p class=\"p1\"># mark only LoRA injected layers for training\n\n<p class=\"p1\">mark_only_lora_as_trainable(model)\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>You can notice that the <code>with lora<\/code>\u00a0is a Python context manager which implements the replacement of <code>CausalSelfAttention<\/code>\u00a0with the LoRA-injected trainable parameters.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">\n\n<p class=\"p1\">@contextmanager\n\n<p class=\"p1\">def lora(r, alpha, dropout, enabled: bool = True):\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>\"\"\"A context manager under which you can instantiate the model with LoRA.\"\"\"\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>if not enabled:\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 \u00a0 \u00a0 <\/span>yield\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 \u00a0 \u00a0 <\/span>return\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>LoRACausalSelfAttention.lora_config = LoRAConfig(r=r, alpha=alpha, dropout=dropout)\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>causal_self_attention = llama.CausalSelfAttention\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>llama.CausalSelfAttention = LoRACausalSelfAttention\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>yield\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>llama.CausalSelfAttention = causal_self_attention\n\n<p class=\"p1\"><span class=\"Apple-converted-space\">\u00a0 \u00a0 <\/span>LoRACausalSelfAttention.lora_config = None\n\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>We are now ready to fine-tune LLaMA. You can fine-tune Lit-LLaMA on the Alpaca dataset using LoRA and quantization on a consumer GPU. Lit-LLaMA comes with a simple script for downloading and preparing the Alpaca dataset, which you can find <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/scripts\/prepare_alpaca.py\">here<\/a>.<\/p>\n<blockquote><p>Note: You can convert the official Meta AI LLaMA weights to Lit-LLaMA format using the instructions <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/tree\/main#use-the-model\">here<\/a>.<\/p><\/blockquote>\n<p>Follow these two simple steps to instruction-tune LLaMA:<\/p>\n<ol>\n<li>Download data and generate instruction tuning dataset: <code>python scripts\/prepare_alpaca.py<\/code><\/li>\n<li>Run the fine-tuning script: <code>python finetune_lora.py<\/code><\/li>\n<\/ol>\n<p>Find the full fine-tuning code <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/scripts\/prepare_alpaca.py\">here<\/a>.<\/p>\n<h3>Generating text from a trained model<\/h3>\n<p>To generate text predictions, you will need trained model weights. You can use either the official Meta AI weights or the model that you have fine-tuned. Lit-LLaMA includes a text-generation script that can run on a GPU with 8 GB of memory and quantization. To generate text, run the following command in the terminal:<\/p>\n<p><code>python generate.py --quantize true --prompt \"Here's what people think about pineapple pizza: \"<\/code><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5647705\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/68747470733a2f2f706c2d7075626c69632d646174612e73332e616d617a6f6e6177732e636f6d2f6173736574735f6c696768746e696e672f4c6c616d615f70696e656170706c652e676966.gif\" alt=\"\" width=\"890\" height=\"501\" \/><\/p>\n<h2>Conclusion<\/h2>\n<p>Lit-LLaMA promotes open and collective science by releasing its source code under the Apache 2.0 license. It extends the original Meta AI code by adding training, an optimized fine-tuning script, and the ability to run inference on a consumer GPU (using up to 8GB of memory with quantization). Lit-LLaMA has already crossed 2K GitHub stars \ud83d\udcab, and it will be interesting to see what the community builds on top of it.<\/p>\n<a target=\"blank\" href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\" class=\"d-inline-block btn btn-purple\">Go to Repo<\/a>\n","protected":false},"excerpt":{"rendered":"<p>What is LLaMA \ud83e\udd99 LLaMA is a foundational large language model that has been released by Meta AI. LLaMA comes in four size variants: 7B, 13B, 33B, and 65B parameters. The paper shows that training smaller foundation models on large enough tokens is desirable, as it requires less computing power and resources. The 65B parameter<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\" title=\"ReadAccelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":5647659,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[106,41],"tags":[],"glossary":[217],"acf":{"additional_authors":false,"default_editor":true,"show_table_of_contents":false,"hide_from_archive":false,"content_type":"Blog Post","sticky":false,"custom_styles":""},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA - Lightning AI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA - Lightning AI\" \/>\n<meta property=\"og:description\" content=\"What is LLaMA \ud83e\udd99 LLaMA is a foundational large language model that has been released by Meta AI. LLaMA comes in four size variants: 7B, 13B, 33B, and 65B parameters. The paper shows that training smaller foundation models on large enough tokens is desirable, as it requires less computing power and resources. The 65B parameter... Read more &raquo;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2023-04-06T16:45:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-22T17:30:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1816\" \/>\n\t<meta property=\"og:image:height\" content=\"1274\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA\",\"datePublished\":\"2023-04-06T16:45:16+00:00\",\"dateModified\":\"2023-06-22T17:30:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\"},\"wordCount\":1125,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png\",\"articleSection\":[\"Community\",\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\",\"name\":\"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA - Lightning AI\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png\",\"datePublished\":\"2023-04-06T16:45:16+00:00\",\"dateModified\":\"2023-06-22T17:30:45+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png\",\"width\":1816,\"height\":1274},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA - Lightning AI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/","og_locale":"en_US","og_type":"article","og_title":"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA - Lightning AI","og_description":"What is LLaMA \ud83e\udd99 LLaMA is a foundational large language model that has been released by Meta AI. LLaMA comes in four size variants: 7B, 13B, 33B, and 65B parameters. The paper shows that training smaller foundation models on large enough tokens is desirable, as it requires less computing power and resources. The 65B parameter... Read more &raquo;","og_url":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/","og_site_name":"Lightning AI","article_published_time":"2023-04-06T16:45:16+00:00","article_modified_time":"2023-06-22T17:30:45+00:00","og_image":[{"width":1816,"height":1274,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA","datePublished":"2023-04-06T16:45:16+00:00","dateModified":"2023-06-22T17:30:45+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/"},"wordCount":1125,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png","articleSection":["Community","Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/","url":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/","name":"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA - Lightning AI","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png","datePublished":"2023-04-06T16:45:16+00:00","dateModified":"2023-06-22T17:30:45+00:00","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/03\/ScreenShot2023-03-31at11.33.30AM.png","width":1816,"height":1274},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-llama-with-fabric-a-comprehensive-guide-to-training-and-fine-tuning-llama\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Accelerating LLaMA with Fabric: A Comprehensive Guide to Training and Fine-Tuning LLaMA"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647703"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5647703"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647703\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5647659"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5647703"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5647703"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5647703"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5647703"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}