{"id":5647794,"date":"2023-04-26T08:51:09","date_gmt":"2023-04-26T12:51:09","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5647794"},"modified":"2023-06-22T13:13:15","modified_gmt":"2023-06-22T17:13:15","slug":"lora-llm","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/tutorial\/lora-llm\/","title":{"rendered":"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)"},"content":{"rendered":"<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Key takeaway<\/h3> In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, you will learn how to tune an LLM with Low-Rank Adaptation (LoRA) in a computationally efficient manner! <\/div>\n<p>&nbsp;<\/p>\n<h2><strong>Why Finetuning?<\/strong><\/h2>\n<p>Pretrained large language models are often referred to as foundation models for a good reason: they perform well on various tasks, and we can use them as a foundation for finetuning on a target task. As discussed in our previous article (<a href=\"https:\/\/lightning.ai\/pages\/community\/article\/understanding-llama-adapters\/\">Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters<\/a>), we discussed finetuning allows us to adapt a model to a target domain and target task. Still, it can be computationally very costly &#8212; the larger the model, the more expensive it is to update its layers.<\/p>\n<p>As an alternative to updating all layers, parameter-efficient methods such as prefix tuning and adapters have been developed &#8212; for a detailed review, please see our <a href=\"https:\/\/lightning.ai\/pages\/community\/article\/understanding-llama-adapters\/\">previous post<\/a>. Now, there is one more popular parameter-efficient finetuning technique: <a href=\"https:\/\/arxiv.org\/abs\/2106.09685\">Low-rank adaptation (LoRA) by Hu et al<\/a>. What is LoRA? How does it work? And how does it compare to the other popular finetuning approaches? Let&#8217;s answer all these questions in this article!<\/p>\n<p>&nbsp;<\/p>\n<div align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-5647795 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-1.jpg\" alt=\"PCA transformation\" width=\"874\" height=\"361\" align=\"middle\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-1.jpg 1500w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-1-300x124.jpg 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-1-1024x423.jpg 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-1-300x124@2x.jpg 600w\" sizes=\"(max-width: 874px) 100vw, 874px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Making Weight Updates More Efficient<\/strong><\/h2>\n<p>Building on this idea outlined above, the paper\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2106.09685\">LoRA: Low-Rank Adaptation of Large Language Models<\/a>\u00a0proposes to decompose the weight changes,\u00a0<em>\u0394W<\/em>, into a lower-rank representation. (To be technically correct, LoRA does not decompose the matrices directly, but it learns the decomposed matrices via backpropagation \u2014 this is a nitpicky detail that will make sense later).<\/p>\n<p>Before we take a closer look at LoRA, let\u2019s briefly explain the training procedure during regular finetuning. So, what are the weight changes\u00a0<em>\u0394W<\/em>? Suppose\u00a0<em>W<\/em>\u00a0represents the weight matrix in a given neural network layer. Then, using regular backpropagation, we can obtain the weight update\u00a0<em>\u0394W<\/em>, which is typically calculated as a negative gradient of the loss times the learning rate:<\/p>\n<p><em>\u0394W<\/em> = <em>\u03b1<\/em> ( -\u2207 L<sub>W<\/sub>).<\/p>\n<p>Then, when we have <em>\u0394W<\/em>, we can update the original weights as follows: <em>W<\/em>&#8216; = <em>W<\/em> + <em>\u0394W<\/em>. This is illustrated in the figure below (bias vectors are omitted for simplicity):<\/p>\n<p>Alternatively, we can keep the weight update matrix separate and compute the outputs as follows: <em>h = W x + \u0394W x<\/em>,<\/p>\n<p>&nbsp;<\/p>\n<div align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-5647796 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-2.png\" alt=\"Regular backprop\" width=\"614\" height=\"308\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-2.png 1928w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-2-300x151.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-2-1024x514.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-2-1536x771.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-2-300x151@2x.png 600w\" sizes=\"(max-width: 614px) 100vw, 614px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>where <span class=\"notion-text-equation-token\">x<\/span> represents the inputs, as illustrated below.<\/p>\n<p>&nbsp;<\/p>\n<div align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-5647797 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-3.png\" alt=\"\" width=\"424\" height=\"308\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-3.png 1224w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-3-300x217.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-3-1024x742.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-3-300x217@2x.png 600w\" sizes=\"(max-width: 424px) 100vw, 424px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>Why would we do this? For now, this alternative formulation serves a pedagogical goal to illustrate LoRA, but we will come back to it.<\/p>\n<p>So, when we train fully connected (i.e., &#8220;dense&#8221;) layers in a neural network, as shown above, the weight matrices usually have full rank, which is a technical term meaning that a matrix does not have any linearly dependent (i.e., &#8220;redundant&#8221;) rows or columns. In contrast, to full rank, low rank means that the matrix has redundant rows or columns.<\/p>\n<p>So, while the weights of a pretrained model have full rank on the pretrained tasks, the LoRA authors point out that pretrained large language models have a low &#8220;intrinsic dimension&#8221; when they are adapted to a new task, according to\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2012.13255\">Aghajanyan et al.<\/a>\u00a0(2020).<\/p>\n<p>A low intrinsic dimension means the data can be effectively represented or approximated by a lower-dimensional space while retaining most of its essential information or structure. In other words, this means we can decompose the new weight matrix for the adapted task into lower-dimensional (smaller) matrices without losing too much important information.<\/p>\n<p>For example, suppose <em>\u0394W<\/em> is the weight update for an <em>A \u00d7 B<\/em> weight matrix. Then, we can decompose the weight update matrix into two smaller matrices: <em>\u0394W = W<sub>A<\/sub> W<sub>B<\/sub><\/em>, where <em>W<sub>A<\/sub><\/em> is an an <em>A \u00d7 r<\/em>-dimensional matrix, and <em>W<sub>B<\/sub><\/em> is an an <em>r \u00d7 B<\/em>-dimensional matrix. Here, we keep the original weight <em>W<\/em> frozen and only train the new matrices <em>W<sub>A<\/sub><\/em> and <em>W<sub>B<\/sub><\/em>. This, in a nutshell, is the LoRA method, which is illustrated in the figure below.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<div align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-5647798 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-4.png\" alt=\"\" width=\"368\" height=\"277\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-4.png 1372w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-4-300x226.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-4-1024x772.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-4-300x226@2x.png 600w\" sizes=\"(max-width: 368px) 100vw, 368px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>&nbsp;<br \/>\n<strong>Choosing the rank<\/strong><\/p>\n<p>Note that <em>r<\/em>, in the figure above, is a hyperparameter here that we can use to specify the rank of the low-rank matrices used for adaptation. A smaller <em>r<\/em> leads to a simpler low-rank matrix, which results in fewer parameters to learn during adaptation. This can lead to faster training and potentially reduced computational requirements. However, with a smaller <em>r<\/em>, the capacity of the low-rank matrix to capture task-specific information decreases. This may result in lower adaptation quality, and the model might not perform as well on the new task compared to a higher <em>r<\/em>. In summary, choosing a smaller <em>r<\/em> in LoRA has a trade-off between model complexity, adaptation capacity, and the risk of underfitting or overfitting. It\u2019s thus important to experiment with different <em>r<\/em> values to find the right balance to achieve the desired performance on the new task.<\/p>\n<p>&nbsp;<br \/>\n<strong>Implementing LoRA<\/strong><\/p>\n<p>The implementation of LoRA is relatively straight-forward. We can think of it as a modified forward pass for the fully connected layers in an LLM. In pseudo-code, this looks like as follows:<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\ninput_dim = 768  # e.g., the hidden size of the pre-trained model<br \/>\noutput_dim = 768  # e.g., the output size of the layer<br \/>\nrank = 8  # The rank 'r' for the low-rank adaptation\n\nW = ... # from pretrained network with shape input_dim x output_dim\n\nW_A = nn.Parameter(torch.empty(input_dim, rank)) # LoRA weight A<br \/>\nW_B = nn.Parameter(torch.empty(rank, output_dim)) # LoRA weight B\n\n# Initialization of LoRA weights<br \/>\nnn.init.kaiming_uniform_(W_A, a=math.sqrt(5))<br \/>\nnn.init.zeros_(W_B)\n\ndef regular_forward_matmul(x, W):<br \/>\n    h = x @ W<br \/>\nreturn h\n\ndef lora_forward_matmul(x, W, W_A, W_B):<br \/>\n    h = x @ W  # regular matrix multiplication<br \/>\n    h += x @ (W_A @ W_B)*alpha # use scaled LoRA weights<br \/>\nreturn h<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>&nbsp;<\/p>\n<p>In the pseudo-code above, <code>alpha<\/code> is a scaling factor that adjusts the magnitude of the combined result (original model output plus low-rank adaptation). This balances the pretrained model&#8217;s knowledge and the new task-specific adaptation \u2014 by default, <code>alpha<\/code> is usually set to 1. Also note that while <em>W<sub>A<\/sub><\/em> is initialized to small random weights, <em>W<sub>B<\/sub><\/em> is initialized to 0 so that<\/p>\n<p><em>\u0394W = W<sub>A<\/sub> W<sub>B<\/sub> = 0 <\/em> at the beginning of the training, meaning we begin the training with the original weights.<\/p>\n<p>&nbsp;<br \/>\n<strong>Parameter efficiency<\/strong><\/p>\n<p>Now, let&#8217;s address the big elephant in the room: how is this parameter efficient if we introduce new weight matrices? The new matrices <em>W<sub>A<\/sub><\/em> and <em>W<sub>B<\/sub><\/em> can be very small. For example, suppose <em>A=100<\/em> and <em>B=500<\/em>, then the size of <em>\u0394W<\/em> is <em>100 \u00d7 500 = 50,000<\/em>. Now, if we decompose this into two smaller matrices, a <em>100\u00d75<\/em>-dimensional matrix <em>W<sub>A<\/sub><\/em> and a <em>5\u00d7500<\/em>-dimensional matrix <em>W<sub>B<\/sub><\/em>. These two matrices only have <em>5\u00d7 100 + 5 \u00d7 500 = 3,000<\/em> parameters in total.<\/p>\n<p>&nbsp;<br \/>\n<strong>Reducing inference overhead<\/strong><\/p>\n<p>Note that in practice, if we keep the original weights <em>W<\/em> and the matrices <em>W<sub>A<\/sub><\/em> and <em>W<sub>B<\/sub><\/em> separate after training as shown above, we will incur a small efficiency penalty during inference as this introduces an additional computation step. Instead, we can update the weights after training via <em>W&#8217; = W + W<sub>A<\/sub> W<sub>B<\/sub><\/em>, which is analogous to <em>W&#8217; = W + \u0394W<\/em> mentioned earlier.<\/p>\n<p>However, there can be practical advantages in keeping the weight matrices <em>W<sub>A<\/sub><\/em> and <em>W<sub>B<\/sub><\/em> separate. For example, imagine we want to keep our pretrained model as a base model for various customers, and we want to create a finetuned LLM for each customer starting from the base model. In this case, we don&#8217;t need to store the full weight matrices <em>W&#8217;<\/em> for each customer, where storing all the weights <em>W&#8217; = W + W<sub>A<\/sub> W<sub>B<\/sub><\/em> for a model can be very large for LLMs, since LLMs typically have billions to trillions of weight parameters. So instead, we can keep the original model <em>W<\/em> and only need to store the new lightweight matrices <em>W<sub>A<\/sub><\/em> and <em>W<sub>B<\/sub><\/em>.<\/p>\n<p>To illustrate this point with concrete numbers, a full 7B LLaMA checkpoint requires 23 GB of storage capacity, while the LoRA weights can be as small as 8 MB if we choose a rank of <em>r=8<\/em>.<\/p>\n<p>&nbsp;<br \/>\n<strong>How good is it in practice?<\/strong><\/p>\n<p>How good is LoRA in practice, and how does it compare to full finetuning and other parameter-efficient approaches? According to the <a href=\"https:\/\/arxiv.org\/abs\/2106.09685\">LoRA paper<\/a>, the modeling performance of models using LoRA performs slightly better than models using <a href=\"https:\/\/arxiv.org\/abs\/2110.07280\">Adapters<\/a>, <a href=\"https:\/\/arxiv.org\/abs\/2104.08691\">prompt tuning<\/a>, or <a href=\"https:\/\/arxiv.org\/abs\/2101.00190\">prefix tuning<\/a> across several task-specific benchmarks. Often, LoRA performs even better than finetuning all layers, as shown in the annotated table from the LoRA paper below. (ROUGE is a metric for evaluating language translation performance, I explained it in more detail <a href=\"https:\/\/twitter.com\/rasbt\/status\/1639625228622917632?s=20\">here<\/a>.)<\/p>\n<p>&nbsp;<\/p>\n<div align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-5647799 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-5.png\" alt=\"\" width=\"740\" height=\"369\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-5.png 1872w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-5-300x150.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-5-1024x511.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-5-1536x766.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-5-300x150@2x.png 600w\" sizes=\"(max-width: 740px) 100vw, 740px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>Here, it\u2019s worth noting that LoRA is orthogonal to the other finetuning methods, meaning it can also be combined with prefix tuning and adapters, for example.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>LoRA &amp; LLaMA<\/strong><\/h2>\n<p>Now, let&#8217;s work with an implementation of LoRA for finetuning Meta&#8217;s popular LLaMA model. Since this is already a long article, I will refrain from including the detailed code in this article itself, but I recommend checking out the <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\">Lit-LLaMA repository<\/a>, which is a simple, readable reimplementation of Meta&#8217;s popular LLaMA model.<\/p>\n<p>Besides code for training and running LLaMA itself (with the original Meta LLaMA weights), it also contains code for finetuning LLaMA using <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/finetune_adapter.py\">LLaMA-Adapter<\/a> and <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/finetune_lora.py\">LoRA<\/a>.<\/p>\n<p>To get started, I recommend the following <em>How-To<\/em> files:<\/p>\n<ol>\n<li>Downloading pretrained weights [ <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/howto\/download_weights.md\">download_weights.md<\/a> ]<\/li>\n<li>Finetuning with LoRA [ <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/howto\/finetune_lora.md\">finetune_lora.md<\/a> ]<\/li>\n<li>Finetuning with Adapter [ <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\/blob\/main\/howto\/finetune_adapter.md\">finetune_adapter.md<\/a> ] (optional, for comparison studies)<\/li>\n<\/ol>\n<p>In the next section, we will compare the 7B LLaMA base model with the 7B LLaMA base finetuned using LoRA and LLaMA-Adapter. (Note that this requires a GPU with at least 24 Gb RAM). (For more details on the LLaMA-Adapter method, please see my <a href=\"https:\/\/lightning.ai\/pages\/community\/article\/understanding-llama-adapters\/\">previous article<\/a>)<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Computational Performance Benchmarks<\/strong><\/h2>\n<p>In this section, we will compare the computational performance of the LLaMA 7B base model with the base model finetuned using LoRA and LLaMA-Adapter.<\/p>\n<p>The finetuning dataset is the Alpaca 52k instruction dataset described <a href=\"https:\/\/github.com\/tatsu-lab\/stanford_alpaca#data-release\">here<\/a>, which has the following structure:<\/p>\n<p>&nbsp;<\/p>\n<div align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-5647800 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-6.png\" alt=\"\" width=\"687\" height=\"364\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-6.png 1497w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-6-300x159.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-6-1024x543.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-6-300x159@2x.png 600w\" sizes=\"(max-width: 687px) 100vw, 687px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>The dataset itself was generated following the method described in the <a href=\"https:\/\/arxiv.org\/abs\/2212.10560\">Self-Instruct paper<\/a> and consists of 49,759 training examples and 2000 validation examples. The Self-Instruct procedure can be summarized in 4 steps:<\/p>\n<p>How does this work? In a nutshell, it&#8217;s a 4-step process<\/p>\n<ol>\n<li>Seed task pool with a set of human-written instructions (175 in this case) and sample instructions<\/li>\n<li>Use a pretrained LLM (like GPT-3) to determine the task category<\/li>\n<li>Given the new instruction, let a pretrained LLM generate the response<\/li>\n<li>Collect, prune, and filter the responses before adding it to the task pool<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<div align=\"center\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-5647801 aligncenter\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-7.png\" alt=\"\" width=\"981\" height=\"444\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-7.png 2218w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-7-300x136.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-7-1024x464.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-7-1536x695.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-7-2048x927.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-7-300x136@2x.png 600w\" sizes=\"(max-width: 981px) 100vw, 981px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>Note that the Alpaca 52k dataset was collected using the automated self-instruct procedure above. However, you may also use (or compare it with) an alternative dataset. For example, an interesting candidate is the recently released open-source <a href=\"https:\/\/github.com\/databrickslabs\/dolly\/tree\/master\/data\">databricks-dolly-15k<\/a> dataset that contains ~15k instruction\/response finetuning records written by Databricks employees. The Lit-LLaMA repository contains a dataset preparation script in case you want to use this Dolly 15k dataset instead of the Alpaca 52k dataset.<\/p>\n<p>Given the following hyperparameter settings (block size, batch size, and LoRA r) both Adapter and LoRA can finetune the 7B parameter LLaMA base model on a single GPU with 24 Gb RAM using bfloat-16 mixed precision training.<\/p>\n<p><strong>LoRA<\/strong><\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\nlearning_rate = 3e-4<br \/>\nbatch_size = 128<br \/>\nmicro_batch_size = 4<br \/>\ngradient_accumulation_steps = batch_size \/\/ micro_batch_size<br \/>\nepoch_size = 50000 # train dataset size<br \/>\nnum_epochs = 5<br \/>\nmax_iters = num_epochs * epoch_size \/\/ micro_batch_size \/\/ devices<br \/>\nweight_decay = 0.0<br \/>\nblock_size = 512<br \/>\nlora_r = 8<br \/>\nlora_alpha = 16<br \/>\nlora_dropout = 0.05<br \/>\nwarmup_steps = 100<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p><strong><span class=\"notion-enable-hover\" data-token-index=\"0\">LaMA Adapter<\/span><\/strong><\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\nlearning_rate = 9e-3<br \/>\nbatch_size = 128 \/ devices<br \/>\nmicro_batch_size = 4<br \/>\ngradient_accumulation_steps = batch_size \/\/ micro_batch_size<br \/>\nepoch_size = 50000 # train dataset size<br \/>\nnum_epochs = 5<br \/>\nmax_iters = num_epochs * epoch_size \/\/ micro_batch_size \/\/ devices<br \/>\nweight_decay = 0.02<br \/>\nblock_size = 512<br \/>\nwarmup_steps = epoch_size * 2 \/\/ micro_batch_size \/\/ devices<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>In case the code changes in the future, I am including the code (with hyperparameter settings) <a href=\"https:\/\/github.com\/rasbt\/low-rank-adaptation-blog\">here on GitHub<\/a>.<\/p>\n<p>Adapter used about 22 Gb and finished 62,400 iterations in 162 min on an A100. LoRA used 21 Gb of memory and finished in 192 min. In sum, Adapter and LoRA use approximately the same amount of RAM and have roughly the same training time based on the Lit-LLaMA implementations. (Note that this is on a single GPU, but if you have multiple GPUs, just change the <code>devices<\/code> parameter to &gt; 1 to take advantage of additional speedups!)<\/p>\n<p>For comparison, full finetuning (LLaMA 7B consists of 32 transformer blocks and 3 fully connected output layers) required at least 2 GPUs with at least 30 Gb and fully sharded training to distribute the weights. Alternatively, you can use 4 GPUs with a maximum memory usage of 22 Gb per GPU. The training on 4 GPUs and the training took 1956 min. This would be at least 6,000 min on a single GPU, which would be 30-40x more expensive than the parameter-efficient LLaMA-Adapter or LoRA alternatives.<\/p>\n<p>Next, let&#8217;s look at the model outputs after applying the different finetuning strategies.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Evaluating Modeling Performance<\/strong><\/h2>\n<p>There are several metrics for evaluating the text generated by LLMs. For example, perplexity, BLEU, and ROUGE scores are some of the most common evaluation metrics used in natural language processing to assess the performance of LLMs across various tasks. However, all of these metrics have substantial shortcomings, and human evaluations remain the gold standard &#8212; the downside of human evaluations is that they are expensive to create and hard to automate. Since this is already a very long article, I will refrain from a detailed discussion of model evaluation approaches and will defer this to a separate article in the future. In this future article, we will be looking at different Q&amp;A datasets (including [wikitext](&lt;https:\/\/paperswithcode.com\/dataset\/wikitext-2&gt;), [c4](&lt;https:\/\/paperswithcode.com\/dataset\/c4&gt;), and [ptb](&lt;https:\/\/www.kaggle.com\/datasets\/bjoernjostein\/ptbxl-electrocardiography-database&gt;), as well as the above-mentioned metrics.) So, for simplicity, we will only be focusing on a qualitative model comparison below, looking at some outputs generated by the various models.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>A Qualitative Comparison<\/strong><\/h2>\n<p>To compare the models qualitatively, we will be using the prompt <code>\"Please explain how weight decay works\"<\/code><\/p>\n<p>&nbsp;<br \/>\n<strong>Base model<\/strong><\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">$ python generate.py \\<br \/>\n--checkpoint_path checkpoints\/lit-llama\/7B\/lit-llama.pth<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<blockquote><p><em>Please explain how weight decay works&#8230;and why it sometimes seems to work and sometimes doesn&#8217;t!<\/em><\/p>\n<p><em>In the simplest terms, weight decay removes features from the model (as a function of how important they are). It is also known as regularization.<\/em><\/p>\n<p><em>Regularization has<\/em><\/p>\n<p>Time for inference 1: 4.18 sec total, 11.95 tokens\/sec<\/p><\/blockquote>\n<p>As we can see, the model begins to continue the prompt by appending &#8220;&#8230;and why it sometimes seems to work and sometimes doesn&#8217;t!&#8221; That&#8217;s because the base model was only trained on a next-word prediction task and wasn&#8217;t specifically trained to follow instructions. Nonetheless, it attempts the explanation further below. The explanation makes somewhat correct statements about regularization in general, but it doesn&#8217;t really capture the key mechanics of weight decay.<\/p>\n<p><strong><span class=\"notion-enable-hover\" data-token-index=\"0\">LLaMA-Adapter<\/span><\/strong><\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\">$ python generate_adapter.py \\<br \/>\n--adapter_path out\/adapter\/alpaca\/lit-llama-adapter-finetuned.pth \\<br \/>\n--pretrained_path checkpoints\/lit-llama\/7B\/lit-llama.pth<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<blockquote><p><em>Weight decay is a way to penalize terms for their word-count. By choosing sentences with fewer words, a model is more likely to select those sentences, which may provide important information.<\/em><\/p>\n<p>Time for inference: 7.04 sec total, 14.21 tokens\/sec<\/p>\n<p>Memory used: 13.56 GB<\/p><\/blockquote>\n<p>The explanation is grammatically sound, but it is also not necessarily what we are looking for.<\/p>\n<p><strong><span class=\"notion-enable-hover\" data-token-index=\"0\">LoRA<\/span><\/strong><\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\n$ python generate_lora.py \\<br \/>\n--lora_path out\/lora\/alpaca\/lit-llama-lora-finetuned.pth \\<br \/>\n--pretrained_path checkpoints\/lit-llama\/7B\/lit-llama.pth<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<blockquote><p><em>Weight decay is an algorithm parameter used in machine learning models to reduce overfitting. It is a process where the weights of the model are adjusted in order to decrease the influence of less-important features in the model and prevent overfitting. Weight decay is used by adding a small amount of noise to the weights to reduce the impact of the model on the training data. This process encourages the model to focus more on the features that are more important to the predictive task.<\/em><\/p>\n<p>Time for inference: 3.25 sec total, 30.76 tokens\/sec<\/p>\n<p>Memory used: 27.65 GB<\/p><\/blockquote>\n<p>We are getting closer; the explanation does make sense when we think of generalized linear models where we force the model to learn smaller weight parameters that are multiplied with the input features. In neural networks, this would typically be applied to all weight parameters in the model.<\/p>\n<p>Note that the LoRA approach above currently uses the most memory. However, we can reduce this memory usage by merging the LoRA weights with the pretrained model weights, as described earlier.<\/p>\n<p>This qualitative overview is only a thin slice of the capabilities of each of these models since evaluating LLMs is a big topic in itself. We will revisit this topic in a more detailed article in the future. But as a takeaway here, LoRA can be used to finetuning an LLM on an instruction dataset in a relatively cost-effective manner.<\/p>\n<p>&nbsp;<\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p>In this article, we discussed low-rank adaptation (LoRA), a parameter-efficient alternative to full finetuning. We saw that finetuning a relatively large model such as LLaMA can be done in a few hours on a single GPU using LoRA, which makes it particularly attractive to people who don&#8217;t want to spend thousands of dollars on GPU resources. What&#8217;s particularly nice about LoRA is that we can optionally merge the new LoRA weight matrices with the original, pretrained weights, such that we don&#8217;t incur additional overheads or complexity during inference.<\/p>\n<p>As more and more open-source alternatives to ChatGPT or GPT-4 emerge, finetuning and customizing these LLMs on specific target datasets or targets will become more and more attractive across various research fields and industries. And parameter-efficient finetuning techniques such as LoRA make finetuning more resource-efficient and accessible.<\/p>\n<p>Parameter-efficient finetuning techniques such as LoRA and LLaMA-Adapter are provided in the <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\">Lit-LLaMA repository<\/a>. We are always happy about contributions and suggestions if you have ideas for extensions or alternative techniques. Please don&#8217;t hesitate to reach out to us via <a href=\"https:\/\/github.com\/Lightning-AI\/lit-llama\">GitHub<\/a> or <a href=\"https:\/\/discord.com\/invite\/XncpTy7DSt\">Discord<\/a>.<\/p>\n<p>&nbsp;<br \/>\n<strong>Acknowledgements<\/strong><\/p>\n<p>I want to thank Luca Antiga and Adrian Waelchli for the constructive feedback to improve the clarity of this article.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Why Finetuning? Pretrained large language models are often referred to as foundation models for a good reason: they perform well on various tasks, and we can use them as a foundation for finetuning on a target task. As discussed in our previous article (Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/lora-llm\/\" title=\"ReadParameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":5647802,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27,41],"tags":[],"glossary":[218],"acf":{"additional_authors":false,"default_editor":true,"show_table_of_contents":false,"hide_from_archive":false,"content_type":"Blog Post","sticky":false,"custom_styles":"","mathjax":false},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) - Lightning AI<\/title>\n<meta name=\"description\" content=\"In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, you will learn how to tune an LLM with Low-Rank Adaptation (LoRA) computationally efficiently!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) - Lightning AI\" \/>\n<meta property=\"og:description\" content=\"In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, you will learn how to tune an LLM with Low-Rank Adaptation (LoRA) computationally efficiently!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2023-04-26T12:51:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-22T17:13:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1902\" \/>\n\t<meta property=\"og:image:height\" content=\"886\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"15 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)\",\"datePublished\":\"2023-04-26T12:51:09+00:00\",\"dateModified\":\"2023-06-22T17:13:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/\"},\"wordCount\":3015,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png\",\"articleSection\":[\"Articles\",\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/\",\"name\":\"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) - Lightning AI\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png\",\"datePublished\":\"2023-04-26T12:51:09+00:00\",\"dateModified\":\"2023-06-22T17:13:15+00:00\",\"description\":\"In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, you will learn how to tune an LLM with Low-Rank Adaptation (LoRA) computationally efficiently!\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png\",\"width\":1902,\"height\":886},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) - Lightning AI","description":"In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, you will learn how to tune an LLM with Low-Rank Adaptation (LoRA) computationally efficiently!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/","og_locale":"en_US","og_type":"article","og_title":"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) - Lightning AI","og_description":"In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, you will learn how to tune an LLM with Low-Rank Adaptation (LoRA) computationally efficiently!","og_url":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/","og_site_name":"Lightning AI","article_published_time":"2023-04-26T12:51:09+00:00","article_modified_time":"2023-06-22T17:13:15+00:00","og_image":[{"width":1902,"height":886,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_image":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"15 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)","datePublished":"2023-04-26T12:51:09+00:00","dateModified":"2023-06-22T17:13:15+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/"},"wordCount":3015,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png","articleSection":["Articles","Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/","url":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/","name":"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA) - Lightning AI","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png","datePublished":"2023-04-26T12:51:09+00:00","dateModified":"2023-06-22T17:13:15+00:00","description":"In the rapidly evolving field of AI, using large language models in an efficient and effective manner is becoming more and more important. In this article, you will learn how to tune an LLM with Low-Rank Adaptation (LoRA) computationally efficiently!","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/04\/lora-thumbnail.png","width":1902,"height":886},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/article\/lora-llm\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647794"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5647794"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5647794\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5647802"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5647794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5647794"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5647794"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5647794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}