Speech-to-Reverberation Modulation Energy Ratio (SRMR)

Module Interface

class torchmetrics.audio.srmr.SpeechReverberationModulationEnergyRatio(fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False, **kwargs)[source]

Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).

SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.

As input to forward and update the metric accepts the following input

  • preds (Tensor): float tensor with shape (...,time)

As output of forward and compute the metric returns the following output

  • srmr (Tensor): float scaler tensor

Hint

Using this metrics requires you to have gammatone and torchaudio installed. Either install as pip install torchmetrics[audio] or pip install torchaudio and pip install git+https://github.com/detly/gammatone.

Attention

This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a) fast=False, norm=False, max_cf=128, b) fast=False, norm=True, max_cf=30, have a relatively small inconsistency.

Parameters:
  • fs (int) – the sampling rate

  • n_cochlear_filters (int) – Number of filters in the acoustic filterbank

  • low_freq (float) – determines the frequency cutoff for the corresponding gammatone filterbank.

  • min_cf (float) – Center frequency in Hz of the first modulation filter.

  • max_cf (Optional[float]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.

  • norm (bool) – Use modulation spectrum energy normalization

  • fast (bool) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.

Raises:

ModuleNotFoundError – If gammatone or torchaudio package is not installed

Example

>>> from torch import randn
>>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio
>>> preds = randn(8000)
>>> srmr = SpeechReverberationModulationEnergyRatio(8000)
>>> srmr(preds)
tensor(0.3191)
plot(val=None, ax=None)[source]

Plot a single or multiple values from the metric.

Parameters:
  • val (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.

  • ax (Optional[Axes]) – An matplotlib axis object. If provided will add plot to that axis

Return type:

tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>> # Example plotting a single value
>>> import torch
>>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio
>>> metric = SpeechReverberationModulationEnergyRatio(8000)
>>> metric.update(torch.rand(8000))
>>> fig_, ax_ = metric.plot()
../_images/speech_reverberation_modulation_energy_ratio-1.png
>>> # Example plotting multiple values
>>> import torch
>>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio
>>> metric = SpeechReverberationModulationEnergyRatio(8000)
>>> values = [ ]
>>> for _ in range(10):
...     values.append(metric(torch.rand(8000)))
>>> fig_, ax_ = metric.plot(values)
../_images/speech_reverberation_modulation_energy_ratio-2.png

Functional Interface

torchmetrics.functional.audio.srmr.speech_reverberation_modulation_energy_ratio(preds, fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False)[source]

Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).

SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.

Parameters:
  • preds (Tensor) – shape (..., time)

  • fs (int) – the sampling rate

  • n_cochlear_filters (int) – Number of filters in the acoustic filterbank

  • low_freq (float) – determines the frequency cutoff for the corresponding gammatone filterbank.

  • min_cf (float) – Center frequency in Hz of the first modulation filter.

  • max_cf (Optional[float]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.

  • norm (bool) – Use modulation spectrum energy normalization

  • fast (bool) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.

Hint

Usingsing this metrics requires you to have gammatone and torchaudio installed. Either install as pip install torchmetrics[audio] or pip install torchaudio and pip install git+https://github.com/detly/gammatone.

Attention

This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a) fast=False, norm=False, max_cf=128, b) fast=False, norm=True, max_cf=30, have a relatively small inconsistency.

Return type:

Tensor

Returns:

Scalar tensor with srmr value with shape (...)

Raises:

ModuleNotFoundError – If gammatone or torchaudio package is not installed

Example

>>> from torch import randn
>>> from torchmetrics.functional.audio import speech_reverberation_modulation_energy_ratio
>>> preds = randn(8000)
>>> speech_reverberation_modulation_energy_ratio(preds, 8000)
tensor([0.3191], dtype=torch.float64)