Visual Information Fidelity (VIF)

Module Interface

class torchmetrics.image.VisualInformationFidelity(sigma_n_sq=2.0, reduction='mean', **kwargs)[source]

Compute Pixel Based Visual Information Fidelity (VIF).

As input to forward and update the metric accepts the following input

  • preds (Tensor): Predictions from model of shape (N,C,H,W) with H,W ≥ 41

  • target (Tensor): Ground truth values of shape (N,C,H,W) with H,W ≥ 41

As output of forward and compute the metric returns the following output

  • vif-p (Tensor):
    • If reduction='mean' (default), returns a Tensor mean VIF score.

    • If reduction='none', returns a tensor of shape (N,) with VIF values per sample.

Parameters:
  • sigma_n_sq (float) – variance of the visual noise

  • reduction (Literal['mean', 'none']) –

    The reduction method for aggregating scores.

    • 'mean': return the average VIF across the batch.

    • 'none': return a VIF score for each sample in the batch.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>> from torch import randn
>>> from torchmetrics.image import VisualInformationFidelity
>>> preds = randn([32, 3, 41, 41], generator=torch.Generator().manual_seed(42))
>>> target = randn([32, 3, 41, 41], generator=torch.Generator().manual_seed(43))
>>> vif_mean = VisualInformationFidelity(reduction='mean')
>>> vif_mean(preds, target)
tensor(0.0032)
>>> vif_none = VisualInformationFidelity(reduction='none')
>>> vif_none(preds, target)
tensor([0.0040, 0.0049, 0.0017, 0.0039, 0.0041, 0.0043, 0.0030, 0.0028, 0.0012,
        0.0067, 0.0010, 0.0014, 0.0030, 0.0048, 0.0050, 0.0038, 0.0037, 0.0025,
        0.0041, 0.0019, 0.0007, 0.0034, 0.0037, 0.0016, 0.0026, 0.0021, 0.0038,
        0.0033, 0.0031, 0.0020, 0.0036, 0.0057])

Functional Interface

torchmetrics.functional.image.visual_information_fidelity(preds, target, sigma_n_sq=2.0, reduction='mean')[source]

Compute Pixel-Based Visual Information Fidelity (VIF-P).

VIF is a full-reference metric that measures the amount of visual information preserved in a distorted image compared to the reference image.

Parameters:
  • preds (Tensor) – Predicted images of shape (N, C, H, W). Height and width must be at least 41.

  • target (Tensor) – Ground truth images of shape (N, C, H, W). Must match preds in shape.

  • sigma_n_sq (float) – Variance of the visual noise. Default: 2.0.

  • reduction (Literal['mean', 'none']) – Method for reducing the metric across the batch. - “mean”: Return a tensor average over the batch. - “none”: Return a VIF score for each sample as a 1D tensor of shape (N,).

Returns:

VIF score(s). The shape depends on the reduction argument:
  • If reduction="mean", returns a scalar tensor.

  • If reduction="none", returns a tensor of shape (N,).

Return type:

torch.Tensor

Raises:
  • ValueError – If input dimensions are smaller than 41x41.

  • ValueError – If preds and target shapes don’t match.

  • ValueError – If reduction is not "mean" or "none".

Example

>>> from torchmetrics.functional.image import visual_information_fidelity
>>> preds = torch.randn(4, 3, 41, 41, generator=torch.Generator().manual_seed(42))
>>> target = torch.randn(4, 3, 41, 41, generator=torch.Generator().manual_seed(43))
>>> visual_information_fidelity(preds, target, reduction="none")
tensor([0.0040, 0.0049, 0.0017, 0.0039])