{"id":5648217,"date":"2023-06-09T12:25:31","date_gmt":"2023-06-09T16:25:31","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5648217"},"modified":"2023-07-14T09:39:14","modified_gmt":"2023-07-14T13:39:14","slug":"falcon-a-guide-to-finetune-and-inference","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/","title":{"rendered":"Falcon &#8211; A guide to finetune and inference"},"content":{"rendered":"<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Takeaways<\/h3><br \/>\nIn this blog you will learn about the latest open-source large language model <a href=\"https:\/\/falconllm.tii.ae\/\">Falcon<\/a>. How to efficiently fine-tune Falcon and run inference on consumer-grade hardware with less than 4.5 GB of GPU memory!<br \/>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5648220 size-full\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png\" alt=\"\" width=\"1264\" height=\"570\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png 1264w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5-300x135.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5-1024x462.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5-300x135@2x.png 600w\" sizes=\"(max-width: 1264px) 100vw, 1264px\" \/><\/p>\n<p><a href=\"https:\/\/falconllm.tii.ae\/\">Falcon<\/a> is the latest open-source large language model released by <a href=\"https:\/\/tii.ae\/\">Technology Innovation Institute<\/a>. It is an autoregressive decoder-only model with two variants: a 7 billion parameter model and a 40 billion parameter model. The 40B model variant was trained on 384 GPUs on AWS for 2 months.<\/p>\n<p>We have integrated <strong>Falcon<\/strong> into <strong><a href=\"https:\/\/github.com\/Lightning-AI\/lit-gpt\">Lit-GPT<\/a><\/strong>. You can use it for finetuning, quantization, and running inference with quantization that uses less than ~4.5 GB GPU memory.<\/p>\n<h2>Finetune Falcon in 3 steps<\/h2>\n<p>You can finetune it on a single GPU with at least 14 GB of memory when using parameter-efficient finetuning techniques. The finetuning of Falcon 40B on the Alpaca instruction dataset can be reduced from 30 hours to 30 minutes using the LLaMA Adapter (8 A100s).<\/p>\n<p>Lit-Parrot provides the script for data preparation and finetuning. It provides scripts for downloading weights, preparing datasets, finetuning, and performing inference. To finetune a custom dataset using Falcon, follow these three steps:<\/p>\n<ol>\n<li>Download the weights.<\/li>\n<li>Prepare the dataset.<\/li>\n<li>Perform finetuning.<\/li>\n<\/ol>\n<h3>Download and convert the Falcon weights<\/h3>\n<p>This blog post is using the Falcon-7B variant, but you can also run all the scripts with 40B.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\n# download the model weights<br \/>\npython scripts\/download.py --repo_id tiiuae\/falcon-7b\n\n# convert the weights to Lit-Parrot format<br \/>\npython scripts\/convert_hf_checkpoint.py --checkpoint_dir checkpoints\/tiiuae\/falcon-7b<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<h3>Prepare the dataset<\/h3>\n<p>We will use the <a href=\"https:\/\/github.com\/tatsu-lab\/stanford_alpaca\">Alpaca dataset<\/a> from Stanford, a 52K instruction example dataset. You can customize the <code><a href=\"https:\/\/github.com\/Lightning-AI\/lit-gpt\/blob\/main\/scripts\/prepare_alpaca.py\">prepare_alpaca<\/a><\/code> script to use your own custom dataset.<\/p>\n<p>The <code><a href=\"https:\/\/github.com\/Lightning-AI\/lit-gpt\/blob\/main\/scripts\/prepare_alpaca.py\">prepare_alpaca<\/a><\/code> provides prepare function that loads the raw instruction dataset, creates prompts, and tokenizes them using the model tokenizer provided in the <code>checkpoint_dir<\/code>. The tokenized data is split into training and test sets based on the test_split_size provided and saved to the <code>destination_path<\/code>.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\npython scripts\/prepare_alpaca.py \\<br \/>\n--destination_path data\/alpaca \\<br \/>\n--checkpoint_dir checkpoints\/tiiuae\/falcon-7b<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>You can follow <a href=\"https:\/\/lightning.ai\/pages\/blog\/how-to-finetune-gpt-like-large-language-models-on-a-custom-dataset\/\">how to finetune LLM on a custom dataset<\/a> blog for a step-by-step tutorial.<\/p>\n<h3>Finetuning the Falcon model<\/h3>\n<p>Once you have prepared your dataset, it is pretty straightforward to finetune the model. You can adjust the <code>micro_batch_size<\/code>, number of <code>devices<\/code> , epochs, warmup and other hyperparameters on the top of the <a href=\"https:\/\/github.com\/Lightning-AI\/lit-gpt\/blob\/main\/finetune\/adapter_v2.py\">finetuning script<\/a>. We will be using the default hyperparameters to finetune Falcon on Alpaca dataset using the AdapterV2 technique.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\npython finetune\/adapter_v2.py \\<br \/>\n--data_dir data\/alpaca \\<br \/>\n--checkpoint_dir checkpoints\/tiiuae\/falcon-7b \\<br \/>\n--out_dir out\/adapter\/alpaca<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>You can find the model checkpoints in the <code>out\/adapter\/alpaca<\/code> folder and use the generation script to play around with the model. It takes approximately half an hour to finetune the model on a 8 A100 GPUs or approximately 3 hours on 1 GPU.<\/p>\n<h2>Running inference with the finetuned model<\/h2>\n<p>You can use the finetuned checkpoint of your LLM for generating texts. Lit-Parrot provides generation scripts. We will use the <code><a href=\"https:\/\/github.com\/Lightning-AI\/lit-gpt\/blob\/main\/generate\/adapter_v2.py\">adapter_v2<\/a><\/code> for generating texts. It supports <code>int8<\/code> and <code>int4<\/code> quantization for devices with less GPU memory.<\/p>\n<p>To run inference under 10 GB GPU memory you can use the <code>int8<\/code> precision by using the <code>llm.int8<\/code> quantize argument.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\npython generate\/adapter_v2.py \\<br \/>\n--adapter_path out\/adapter\/alpaca\/lit_model_adapter_finetuned.pth \\<br \/>\n--checkpoint_dir checkpoints\/tiiuae\/falcon-7b \\<br \/>\n--quantize llm.int8 \\ # Quantization argument<br \/>\n--prompt \"What food do lamas eat?\"<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>If you have limited GPU memory and want to run Falcon-7B inference using less than 4.5 GB of memory, you can use the <code>int4<\/code> precision. This reduces Falcon-40B memory usage from 80 GB to around 24 GB (note that the quantization process consumes around 32 GB). Lit-Parrot provides a <a href=\"https:\/\/github.com\/Lightning-AI\/lit-gpt\/blob\/main\/quantize\/gptq.py\">GPTQ conversion script<\/a> that you can find <a href=\"https:\/\/github.com\/Lightning-AI\/lit-gpt\/blob\/main\/quantize\/gptq.py\">here<\/a>.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\npython generate\/adapter_v2.py \\<br \/>\n--adapter_path out\/adapter\/alpaca\/lit_model_adapter_finetuned.pth \\<br \/>\n--checkpoint_dir checkpoints\/tiiuae\/falcon-7b \\<br \/>\n--quantize gptq.int4 \\ # quantization argument<br \/>\n--prompt \"What food do lamas eat?\"<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>Below is a benchmark that shows how precision affects GPU memory and inference speed. For <code>int4<\/code>, we are using the Triton kernels released with the original <a href=\"https:\/\/arxiv.org\/pdf\/2210.17323.pdf\">GPTQ<\/a>, which is slower but can be optimized with new GPTQ implementations or alternative quantization methods. <a href=\"https:\/\/twitter.com\/LightningAI\">Stay tuned<\/a> to find out when we optimize the <code>int4<\/code> precision!<\/p>\n<div id=\"attachment_5648221\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5648221\" class=\"wp-image-5648221 size-large\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-6-1024x299.png\" alt=\"\" width=\"1024\" height=\"299\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-6-1024x299.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-6-300x88.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-6.png 1308w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-6-300x88@2x.png 600w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-5648221\" class=\"wp-caption-text\">Evaluated on Nvidia A100<\/p><\/div>\n<h2>Learn more about large language models and efficient fine-tuning techniques \ud83d\udc47<\/h2>\n<p>This article provided you with short step-by-step instructions to finetune your own Falcon model. We saw that thanks to parameter-efficient finetuning techniques, it\u2019s possible to do it on a single GPU. If you want to learn more about these techniques, check out our more in-depth guides below.<\/p>\n<ul>\n<li><a href=\"https:\/\/lightning.ai\/pages\/blog\/how-to-finetune-gpt-like-large-language-models-on-a-custom-dataset\/\">How To Finetune GPT Like Large Language Models on a Custom Dataset<\/a><\/li>\n<li><a href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/accelerating-large-language-models-with-mixed-precision-techniques\/\">Accelerating Large Language Models with Mixed-Precision Techniques<\/a><\/li>\n<li><a href=\"https:\/\/lightning.ai\/pages\/community\/tutorial\/lora-llm\/\">Parameter-Efficient LLM Finetuning With Low-Rank Adaptation (LoRA)<\/a><\/li>\n<li><a href=\"https:\/\/lightning.ai\/pages\/community\/article\/understanding-llama-adapters\/\">Understanding Parameter-Efficient Finetuning of Large Language Models: From Prefix Tuning to LLaMA-Adapters<\/a><\/li>\n<\/ul>\n<p>Join our <a href=\"https:\/\/discord.gg\/nnAuZvqTu3\">Discord community<\/a> to chat and ask your questions!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Falcon is the latest open-source large language model released by Technology Innovation Institute. It is an autoregressive decoder-only model with two variants: a 7 billion parameter model and a 40 billion parameter model. The 40B model variant was trained on 384 GPUs on AWS for 2 months. We have integrated Falcon into Lit-GPT. You<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\" title=\"ReadFalcon &#8211; A guide to finetune and inference\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":5648220,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27,29,106,41],"tags":[201,193,188],"glossary":[203,212,213,223],"acf":{"additional_authors":false,"mathjax":false,"default_editor":true,"show_table_of_contents":false,"hide_from_archive":false,"content_type":"Blog Post","sticky":true,"custom_styles":""},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Falcon - A guide to finetune and inference - Lightning AI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Falcon - A guide to finetune and inference - Lightning AI\" \/>\n<meta property=\"og:description\" content=\"&nbsp; Falcon is the latest open-source large language model released by Technology Innovation Institute. It is an autoregressive decoder-only model with two variants: a 7 billion parameter model and a 40 billion parameter model. The 40B model variant was trained on 384 GPUs on AWS for 2 months. We have integrated Falcon into Lit-GPT. You... Read more &raquo;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-09T16:25:31+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-07-14T13:39:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1264\" \/>\n\t<meta property=\"og:image:height\" content=\"570\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"Falcon &#8211; A guide to finetune and inference\",\"datePublished\":\"2023-06-09T16:25:31+00:00\",\"dateModified\":\"2023-07-14T13:39:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\"},\"wordCount\":818,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png\",\"keywords\":[\"Falcon\",\"LLaMA\",\"LLMs\"],\"articleSection\":[\"Articles\",\"Blog\",\"Community\",\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\",\"url\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\",\"name\":\"Falcon - A guide to finetune and inference - Lightning AI\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png\",\"datePublished\":\"2023-06-09T16:25:31+00:00\",\"dateModified\":\"2023-07-14T13:39:14+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png\",\"width\":1264,\"height\":570},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Falcon &#8211; A guide to finetune and inference\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Falcon - A guide to finetune and inference - Lightning AI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/","og_locale":"en_US","og_type":"article","og_title":"Falcon - A guide to finetune and inference - Lightning AI","og_description":"&nbsp; Falcon is the latest open-source large language model released by Technology Innovation Institute. It is an autoregressive decoder-only model with two variants: a 7 billion parameter model and a 40 billion parameter model. The 40B model variant was trained on 384 GPUs on AWS for 2 months. We have integrated Falcon into Lit-GPT. You... Read more &raquo;","og_url":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/","og_site_name":"Lightning AI","article_published_time":"2023-06-09T16:25:31+00:00","article_modified_time":"2023-07-14T13:39:14+00:00","og_image":[{"width":1264,"height":570,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"Falcon &#8211; A guide to finetune and inference","datePublished":"2023-06-09T16:25:31+00:00","dateModified":"2023-07-14T13:39:14+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/"},"wordCount":818,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png","keywords":["Falcon","LLaMA","LLMs"],"articleSection":["Articles","Blog","Community","Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/","url":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/","name":"Falcon - A guide to finetune and inference - Lightning AI","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png","datePublished":"2023-06-09T16:25:31+00:00","dateModified":"2023-07-14T13:39:14+00:00","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/06\/Untitled-5.png","width":1264,"height":570},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/blog\/falcon-a-guide-to-finetune-and-inference\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Falcon &#8211; A guide to finetune and inference"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5648217"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5648217"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5648217\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5648220"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5648217"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5648217"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5648217"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5648217"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}