Speech-to-Reverberation Modulation Energy Ratio (SRMR)¶

Module Interface¶

class torchmetrics.audio.srmr.SpeechReverberationModulationEnergyRatio(fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False, **kwargs)[source]

Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).

SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.

As input to forward and update the metric accepts the following input

preds (Tensor): float tensor with shape (...,time)

As output of forward and compute the metric returns the following output

srmr (Tensor): float scaler tensor

Note

using this metrics requires you to have gammatone and torchaudio installed. Either install as pip install torchmetrics[audio] or pip install torchaudio and pip install git+https://github.com/detly/gammatone.

Note

This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a) fast=False, norm=False, max_cf=128, b) fast=False, norm=True, max_cf=30, have a relatively small inconsistence.

Parameters:

fs¶ (int) – the sampling rate
n_cochlear_filters¶ (int) – Number of filters in the acoustic filterbank
low_freq¶ (float) – determines the frequency cutoff for the corresponding gammatone filterbank.
min_cf¶ (float) – Center frequency in Hz of the first modulation filter.
max_cf¶ (Optional[float]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.
norm¶ (bool) – Use modulation spectrum energy normalization
fast¶ (bool) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.

Raises:

ModuleNotFoundError – If gammatone or torchaudio package is not installed

Example

>>> import torch
>>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio
>>> g = torch.manual_seed(1)
>>> preds = torch.randn(8000)
>>> srmr = SpeechReverberationModulationEnergyRatio(8000)
>>> srmr(preds)
tensor(0.3354)

plot(val=None, ax=None)[source]

Plot a single or multiple values from the metric.

Parameters:

val¶ (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.
ax¶ (Optional[Axes]) – An matplotlib axis object. If provided will add plot to that axis

Return type:

Tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>> # Example plotting a single value
>>> import torch
>>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio
>>> metric = SpeechReverberationModulationEnergyRatio(8000)
>>> metric.update(torch.rand(8000))
>>> fig_, ax_ = metric.plot()

../_images/speech_reverberation_modulation_energy_ratio-1.png

>>> # Example plotting multiple values
>>> import torch
>>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio
>>> metric = SpeechReverberationModulationEnergyRatio(8000)
>>> values = [ ]
>>> for _ in range(10):
...     values.append(metric(torch.rand(8000)))
>>> fig_, ax_ = metric.plot(values)

../_images/speech_reverberation_modulation_energy_ratio-2.png

Functional Interface¶

torchmetrics.functional.audio.srmr.speech_reverberation_modulation_energy_ratio(preds, fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False)[source]

Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).

SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.

Parameters:

preds¶ (Tensor) – shape (..., time)
fs¶ (int) – the sampling rate
n_cochlear_filters¶ (int) – Number of filters in the acoustic filterbank
low_freq¶ (float) – determines the frequency cutoff for the corresponding gammatone filterbank.
min_cf¶ (float) – Center frequency in Hz of the first modulation filter.
max_cf¶ (Optional[float]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.
norm¶ (bool) – Use modulation spectrum energy normalization
fast¶ (bool) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.

Note

using this metrics requires you to have gammatone and torchaudio installed. Either install as pip install torchmetrics[audio] or pip install torchaudio and pip install git+https://github.com/detly/gammatone.

Note

This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a) fast=False, norm=False, max_cf=128, b) fast=False, norm=True, max_cf=30, have a relatively small inconsistence.

Returns:: srmr value, shape (...)
Return type:: Tensor
Raises:: ModuleNotFoundError – If gammatone or torchaudio package is not installed

Example

>>> import torch
>>> from torchmetrics.functional.audio import speech_reverberation_modulation_energy_ratio
>>> g = torch.manual_seed(1)
>>> preds = torch.randn(8000)
>>> speech_reverberation_modulation_energy_ratio(preds, 8000)
tensor([0.3354], dtype=torch.float64)