class pytorch_lightning.callbacks.QuantizationAwareTraining(qconfig='fbgemm', observer_type='average', collect_quantization=None, modules_to_fuse=None, input_compatible=True, quantize_on_fit_end=True, observer_enabled_stages=('train'))[source]

Bases: pytorch_lightning.callbacks.base.Callback

Quantization allows speeding up inference and decreasing memory requirements by performing computations and storing tensors at lower bitwidths (such as INT8 or FLOAT16) than floating point precision. We use native PyTorch API so for more information see PyTorch Quantization.


QuantizationAwareTraining is in beta and subject to change.

  • qconfig (Union[str, QConfig]) –

    quantization configuration:

  • observer_type (str) – allows switching between MovingAverageMinMaxObserver as “average” (default) and HistogramObserver as “histogram” which is more computationally expensive.

  • collect_quantization (Union[Callable, int, None]) –

    count or custom function to collect quantization statistics:

    • None (deafult). The quantization observer is called in each module forward

      (useful for collecting extended statistic when useing image/data augmentation).

    • int. Use to set a fixed number of calls, starting from the beginning.

    • Callable. Custom function with single trainer argument.

      See this example to trigger only the last epoch:

      def custom_trigger_last(trainer):
          return trainer.current_epoch == (trainer.max_epochs - 1)

  • modules_to_fuse (Optional[Sequence]) – allows you fuse a few layers together as shown in diagram to find which layer types can be fused, check

  • input_compatible (bool) – preserve quant/dequant layers. This allows to feat any input as to the original model, but break compatibility to torchscript and export with

  • quantize_on_fit_end (bool) – perform the quantization in on_fit_end. Note that once converted, the model cannot be put in training mode again.

  • observer_enabled_stages (Sequence[str]) –

    allow fake-quantization modules’ observers to do calibration during provided stages:

    • 'train': the observers can do calibration during training.

    • 'validate': the observers can do calibration during validating. Note that we don’t disable observers during the sanity check as the model hasn’t been calibrated with training data yet. After the sanity check, the fake-quantization modules are restored to initial states.

    • 'test': the observers can do calibration during testing.

    • 'predict': the observers can do calibration during predicting.

    Note that we only handle observers belonging to fake-quantization modules. When qconfig is a str and observer_type is 'histogram', the observers won’t belong to any fake-quantization modules and will not be controlled by the callback.

on_fit_end(trainer, pl_module)[source]

Called when fit ends.

Return type


on_fit_start(trainer, pl_module)[source]

Called when fit begins.

Return type


on_predict_end(trainer, pl_module)[source]

Called when predict ends.

Return type


on_predict_start(trainer, pl_module)[source]

Called when the predict begins.

Return type


on_test_end(trainer, pl_module)[source]

Called when the test ends.

Return type


on_test_start(trainer, pl_module)[source]

Called when the test begins.

Return type


on_train_end(trainer, pl_module)[source]

Called when the train ends.

Return type


on_train_start(trainer, pl_module)[source]

Called when the train begins.

Return type


on_validation_end(trainer, pl_module)[source]

Called when the validation loop ends.

Return type


on_validation_start(trainer, pl_module)[source]

Called when the validation loop begins.

Return type