Speech-to-Reverberation Modulation Energy Ratio (SRMR)¶
Module Interface¶
- class torchmetrics.audio.srmr.SpeechReverberationModulationEnergyRatio(fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False, **kwargs)[source]¶
Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).
SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.
As input to
forward
andupdate
the metric accepts the following inputpreds
(Tensor
): float tensor with shape(...,time)
As output of forward and compute the metric returns the following output
srmr
(Tensor
): float scaler tensor
Note
using this metrics requires you to have
gammatone
andtorchaudio
installed. Either install aspip install torchmetrics[audio]
orpip install torchaudio
andpip install git+https://github.com/detly/gammatone
.Note
This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a) fast=False, norm=False, max_cf=128, b) fast=False, norm=True, max_cf=30, have a relatively small inconsistence.
- Parameters:
n_cochlear_filters¶ (
int
) – Number of filters in the acoustic filterbanklow_freq¶ (
float
) – determines the frequency cutoff for the corresponding gammatone filterbank.min_cf¶ (
float
) – Center frequency in Hz of the first modulation filter.max_cf¶ (
Optional
[float
]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.fast¶ (
bool
) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.
- Raises:
ModuleNotFoundError – If
gammatone
ortorchaudio
package is not installed
Example
>>> from torch import randn >>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio >>> preds = randn(8000) >>> srmr = SpeechReverberationModulationEnergyRatio(8000) >>> srmr(preds) tensor(0.3191)
- plot(val=None, ax=None)[source]¶
Plot a single or multiple values from the metric.
- Parameters:
val¶ (
Union
[Tensor
,Sequence
[Tensor
],None
]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.ax¶ (
Optional
[Axes
]) – An matplotlib axis object. If provided will add plot to that axis
- Return type:
- Returns:
Figure and Axes object
- Raises:
ModuleNotFoundError – If matplotlib is not installed
>>> # Example plotting a single value >>> import torch >>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio >>> metric = SpeechReverberationModulationEnergyRatio(8000) >>> metric.update(torch.rand(8000)) >>> fig_, ax_ = metric.plot()
>>> # Example plotting multiple values >>> import torch >>> from torchmetrics.audio import SpeechReverberationModulationEnergyRatio >>> metric = SpeechReverberationModulationEnergyRatio(8000) >>> values = [ ] >>> for _ in range(10): ... values.append(metric(torch.rand(8000))) >>> fig_, ax_ = metric.plot(values)
Functional Interface¶
- torchmetrics.functional.audio.srmr.speech_reverberation_modulation_energy_ratio(preds, fs, n_cochlear_filters=23, low_freq=125, min_cf=4, max_cf=None, norm=False, fast=False)[source]¶
Calculate Speech-to-Reverberation Modulation Energy Ratio (SRMR).
SRMR is a non-intrusive metric for speech quality and intelligibility based on a modulation spectral representation of the speech signal. This code is translated from SRMRToolbox and SRMRpy.
- Parameters:
n_cochlear_filters¶ (
int
) – Number of filters in the acoustic filterbanklow_freq¶ (
float
) – determines the frequency cutoff for the corresponding gammatone filterbank.min_cf¶ (
float
) – Center frequency in Hz of the first modulation filter.max_cf¶ (
Optional
[float
]) – Center frequency in Hz of the last modulation filter. If None is given, then 30 Hz will be used for norm==False, otherwise 128 Hz will be used.fast¶ (
bool
) – Use the faster version based on the gammatonegram. Note: this argument is inherited from SRMRpy. As the translated code is based to pytorch, setting fast=True may slow down the speed for calculating this metric on GPU.
Note
using this metrics requires you to have
gammatone
andtorchaudio
installed. Either install aspip install torchmetrics[audio]
orpip install torchaudio
andpip install git+https://github.com/detly/gammatone
.Note
This implementation is experimental, and might not be consistent with the matlab implementation SRMRToolbox, especially the fast implementation. The slow versions, a) fast=False, norm=False, max_cf=128, b) fast=False, norm=True, max_cf=30, have a relatively small inconsistence.
- Return type:
- Returns:
Scalar tensor with srmr value with shape
(...)
- Raises:
ModuleNotFoundError – If
gammatone
ortorchaudio
package is not installed
Example
>>> from torch import randn >>> from torchmetrics.functional.audio import speech_reverberation_modulation_energy_ratio >>> preds = randn(8000) >>> speech_reverberation_modulation_energy_ratio(preds, 8000) tensor([0.3191], dtype=torch.float64)