thunder.plugins.QuantizeInt4

class thunder.plugins.QuantizeInt4[source]

Bases: Plugin

Plugin for 4-bit integer quantization using BitsAndBytes.

This plugin applies a 4-bit linear quantization transform to model weights, reducing memory footprint and improving throughput for both training and inference.

See https://github.com/bitsandbytes-foundation/bitsandbytes/blob/main/bitsandbytes/functional.py#L889 for more details.

__init__()[source]

Methods

__init__()

setup_executors()

Fetches the BitsAndBytes quantization executor.

setup_lookasides()

rtype:

Optional[list[Lookaside]]

setup_transforms()

Fetches the BitsAndBytes quantization transform.

Attributes

policy

setup_executors()[source]

Fetches the BitsAndBytes quantization executor.

Returns:

A list containing the Transformer Engine executor.

Return type:

list[Executor]

setup_transforms()[source]

Fetches the BitsAndBytes quantization transform.

Returns:

A list containing the Transformer Engine executor.

Return type:

list[Transform]