Lightning AI Studios: Never set up a local environment again →

← Back to glossary

GPTQ

GPTQ is a one-shot weight quantization method based on approximate second-order information, enabling efficient compression of GPT models with 175 billion parameters while preserving accuracy, allowing for single-GPU execution and significant inference speedups over FP16.

Related content

How To Finetune GPT Like Large Language Models on a Custom Dataset