← Back to glossary

GPTQ

GPTQ is a one-shot weight quantization method based on approximate second-order information, enabling efficient compression of GPT models with 175 billion parameters while preserving accuracy, allowing for single-GPU execution and significant inference speedups over FP16.

Related content