{"id":5649045,"date":"2023-10-03T07:00:41","date_gmt":"2023-10-03T11:00:41","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5649045"},"modified":"2023-10-03T13:36:26","modified_gmt":"2023-10-03T17:36:26","slug":"what-is-quantization","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/","title":{"rendered":"What is Quantization"},"content":{"rendered":"<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Takeaways<\/h3> <span style=\"font-weight: 400;\">Learn how quantization reduces the memory footprint of models like Llama 2 as much as 4x!<\/span> <\/div>\n<h2>Introduction<\/h2>\n<p>The aim of quantization is to reduce the memory usage of the model parameters by using lower precision types than your typical float32 or (b)float16. Using lower bit widths like 8-bit and 4-bit uses less memory compared to float32 (32-bit) and (b)float16 (16-bit). The quantization procedure does not simply trim the number of bits used, but compresses the values to reduce the amount of information lost.<\/p>\n<p>Using quantization to compress models that have billions of parameters like Llama 2 or SDXL makes deployment on edge devices with less memory capacity possible. Thankfully, Lightning Fabric makes quantization as easy as setting a mode flag!<\/p>\n<pre class=\"snippet-shortcode code-shortcode dark-theme collapse-false\"><code class=\"hljs language-python\">from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# all available quantization modes\r\n# \"nf4\", \"nf4-dq\", \"fp4\", \"fp4-dq\", \"int8\", \"int8-training\"\r\n\r\nmode = \"nf4\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers<\/code><div class=\"copy-button\"><button class=\"expand-button active\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>Learning about quantization is an absolute must to move models from idea to training and production at the edge. We\u2019ll cover 8-bit, 4-bit, and double quantization below.<\/p>\n<h2>8-bit Quantization<\/h2>\n<p>8-bit quantization is discussed in the popular paper <a href=\"https:\/\/arxiv.org\/abs\/2110.02861\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">8-bit Optimizers via Block-wise Quantization<\/span><\/a> and was introduced in <a href=\"https:\/\/arxiv.org\/pdf\/2209.05433.pdf\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">FP8 Formats for Deep Learning<\/span><\/a>. As stated in the original paper, 8-bit quantization was the natural progression after 16-bit precision. Although it was the natural progression, the implementation was not as simple as moving from FP32 to FP16 \u2013 as those two floating point types share the same representation scheme and 8-bit does not.<\/p>\n<p>8-bit quantization requires a new representation scheme, and this new scheme allows for fewer numbers to be represented than FP16 or FP32. This means model performance may be affected when using quantization, so it is good to be aware of this trade-off. Additionally, model performance should be evaluated in its quantized form if the weights will be used on an edge device that requires quantization.<\/p>\n<p>Lightning Fabric can use 8-bit quantization by setting the mode flag to int8-training for training, or int8 for inference.<\/p>\n<pre class=\"snippet-shortcode code-shortcode dark-theme collapse-false\"><code class=\"hljs language-python\">from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# available 8-bit quantization modes\r\n# (\"int8\", \"int8-training\")\r\n\r\nmode = \"int8\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers<\/code><div class=\"copy-button\"><button class=\"expand-button active\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>Just as 8-bit quantization is the natural progression from 16-bit precision, 4-bit quantization is the next smallest representation scheme. Let\u2019s talk about 4-bit quantization in the following sections.<\/p>\n<h2>4-bit Quantization<\/h2>\n<p>4-bit quantization is discussed in the popular paper <a href=\"https:\/\/arxiv.org\/abs\/2305.14314\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">QLoRA: Efficient Finetuning of Quantized LLMs<\/span><\/a>. QLoRA is a finetuning method that uses 4-bit quantization. The paper introduces this finetuning technique and demonstrates how it can be used to \u201cfinetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance\u201d by using the NF4 (normal float) format.<\/p>\n<p>Lightning Fabric can use 4-bit quantization by setting the mode flag to either nf4 or fp4.<\/p>\n<pre class=\"snippet-shortcode code-shortcode dark-theme collapse-false\"><code class=\"hljs language-python\">from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# available 4-bit quantization modes\r\n# (\"nf4\", \"fp4\")\r\n\r\nmode = \"nf4\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers<\/code><div class=\"copy-button\"><button class=\"expand-button active\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<h2>Double Quantization<\/h2>\n<p>Double quantization exists as an extra 4-bit quantization setting introduced alongside NF4 in QLoRA: Efficient Finetuning of Quantized LLMs. Double quantization works by quantizing the quantization constants that are internal to bitsandbytes\u2019 procedures.<\/p>\n<p>Lightning Fabric can use 4-bit double quantization by setting the mode flag to either nf4-dq or fp4-dq.<\/p>\n<pre class=\"snippet-shortcode code-shortcode dark-theme collapse-false\"><code class=\"hljs language-python\">from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# available 4-bit double quantization modes\r\n# (\"nf4-dq\", \"fp4-dq\")\r\n\r\nmode = \"nf4-dq\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers<\/code><div class=\"copy-button\"><button class=\"expand-button active\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<h2>Conclusion<\/h2>\n<p>Quantization is a must for most production systems given that edge devices and consumer grade hardware typically require models of a much smaller memory footprint than more powerful hardware such as NVIDIA\u2019s A100 80GB. Learning about this technique will enable a better understanding of deployment of LLMs like a Llama 2 and SDXL, and requirements for edge devices in robotics, vehicles, and other systems.<\/p>\n<h2>Still have questions?<\/h2>\n<p>We have an amazing community and team of core engineers ready to answer your questions. So, join us on <a href=\"https:\/\/lightning.ai\/forums\/\" target=\"_blank\" rel=\"noopener\">Discourse<\/a> or <a href=\"https:\/\/discord.gg\/XncpTy7DSt\" target=\"_blank\" rel=\"noopener\">Discord<\/a>. See you there!<\/p>\n<h2>Resources and References<\/h2>\n<p><a href=\"https:\/\/pytorch.org\/blog\/introduction-to-quantization-on-pytorch\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Introduction to Quantization<\/span><\/a><br \/>\n<a href=\"https:\/\/pytorch.org\/docs\/stable\/quantization.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Introduction to Quantization and API Summary<\/span><\/a><br \/>\n<a href=\"https:\/\/pytorch.org\/blog\/quantization-in-practice\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Quantization in Practice<\/span><\/a><br \/>\n<a href=\"https:\/\/pytorch.org\/TensorRT\/tutorials\/ptq.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Post Training Quantization<\/span><\/a><br \/>\n<a href=\"https:\/\/lightning.ai\/docs\/fabric\/latest\/fundamentals\/precision.html#quantization-via-bitsandbytes\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Quantization in Lightning Fabric<\/span><\/a><br \/>\n<a href=\"https:\/\/arxiv.org\/pdf\/2209.05433.pdf\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">FP8 Formats for Deep Learning<\/span><\/a><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2110.02861\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">8-bit Optimizers via Block-wise Quantization<\/span><\/a><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2305.14314\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">QLoRA: Efficient Finetuning of Quantized LLMs<\/span><\/a><br \/>\n<a href=\"https:\/\/arxiv.org\/abs\/2210.17323\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers<\/span><\/a><br \/>\n<a href=\"https:\/\/blogs.nvidia.com\/blog\/2020\/05\/14\/tensorfloat-32-precision-format\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x<\/span><\/a><br \/>\n<a href=\"https:\/\/developer.nvidia.com\/automatic-mixed-precision\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Automatic Mixed Precision for Deep Learning<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction The aim of quantization is to reduce the memory usage of the model parameters by using lower precision types than your typical float32 or (b)float16. Using lower bit widths like 8-bit and 4-bit uses less memory compared to float32 (32-bit) and (b)float16 (16-bit). The quantization procedure does not simply trim the number of bits<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\" title=\"ReadWhat is Quantization\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":5649048,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[27,29,41],"tags":[],"glossary":[],"acf":{"mathjax":false,"default_editor":true,"show_table_of_contents":false,"additional_authors":false,"hide_from_archive":false,"content_type":"Blog Post","sticky":false,"code_embed":true,"tabs":false,"custom_styles":"","code_shortcode":[{"shortcode_title":"all_modes","code":"from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# all available quantization modes\r\n# \"nf4\", \"nf4-dq\", \"fp4\", \"fp4-dq\", \"int8\", \"int8-training\"\r\n\r\nmode = \"nf4\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers","syntax":"python","collapse":true},{"shortcode_title":"8_bit","code":"from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# available 8-bit quantization modes\r\n# (\"int8\", \"int8-training\")\r\n\r\nmode = \"int8\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers","syntax":"python","collapse":true},{"shortcode_title":"4_bit","code":"from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# available 4-bit quantization modes\r\n# (\"nf4\", \"fp4\")\r\n\r\nmode = \"nf4\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers","syntax":"python","collapse":true},{"shortcode_title":"4_bit_dq","code":"from lightning_fabric import Fabric\r\nfrom lightning_fabric.plugins import BitsandbytesPrecision\r\n\r\n# available 4-bit double quantization modes\r\n# (\"nf4-dq\", \"fp4-dq\")\r\n\r\nmode = \"nf4-dq\"\r\nplugin = BitsandbytesPrecision(mode=mode)\r\nfabric = Fabric(plugins=plugin)\r\n\r\nmodel = CustomModule() # your PyTorch model\r\nmodel = fabric.setup_module(model) # quantizes the layers","syntax":"python","collapse":true}]},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What is Quantization - Lightning AI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What is Quantization - Lightning AI\" \/>\n<meta property=\"og:description\" content=\"Introduction The aim of quantization is to reduce the memory usage of the model parameters by using lower precision types than your typical float32 or (b)float16. Using lower bit widths like 8-bit and 4-bit uses less memory compared to float32 (32-bit) and (b)float16 (16-bit). The quantization procedure does not simply trim the number of bits... Read more &raquo;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-03T11:00:41+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-10-03T17:36:26+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"What is Quantization\",\"datePublished\":\"2023-10-03T11:00:41+00:00\",\"dateModified\":\"2023-10-03T17:36:26+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\"},\"wordCount\":643,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png\",\"articleSection\":[\"Articles\",\"Blog\",\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\",\"name\":\"What is Quantization - Lightning AI\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png\",\"datePublished\":\"2023-10-03T11:00:41+00:00\",\"dateModified\":\"2023-10-03T17:36:26+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png\",\"width\":1200,\"height\":1200},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"What is Quantization\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What is Quantization - Lightning AI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/","og_locale":"en_US","og_type":"article","og_title":"What is Quantization - Lightning AI","og_description":"Introduction The aim of quantization is to reduce the memory usage of the model parameters by using lower precision types than your typical float32 or (b)float16. Using lower bit widths like 8-bit and 4-bit uses less memory compared to float32 (32-bit) and (b)float16 (16-bit). The quantization procedure does not simply trim the number of bits... Read more &raquo;","og_url":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/","og_site_name":"Lightning AI","article_published_time":"2023-10-03T11:00:41+00:00","article_modified_time":"2023-10-03T17:36:26+00:00","og_image":[{"width":1200,"height":1200,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"What is Quantization","datePublished":"2023-10-03T11:00:41+00:00","dateModified":"2023-10-03T17:36:26+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/"},"wordCount":643,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png","articleSection":["Articles","Blog","Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/","url":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/","name":"What is Quantization - Lightning AI","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png","datePublished":"2023-10-03T11:00:41+00:00","dateModified":"2023-10-03T17:36:26+00:00","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/10\/What-is-Quantization-1.png","width":1200,"height":1200},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/article\/what-is-quantization\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"What is Quantization"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5649045"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5649045"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5649045\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5649048"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5649045"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5649045"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5649045"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5649045"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}