Edit Distance

Module Interface

class torchmetrics.text.EditDistance(substitution_cost=1, reduction='mean', **kwargs)[source]

Calculates the Levenshtein edit distance between two sequences.

The edit distance is the number of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text. The lower the distance, the more accurate the model is considered to be.

Implementation is similar to nltk.edit_distance.

As input to forward and update the metric accepts the following input:

preds (Sequence): An iterable of hypothesis corpus
target (Sequence): An iterable of iterables of reference corpus

As output of forward and compute the metric returns the following output:

eed (Tensor): A tensor with the extended edit distance score. If reduction is set to 'none' or None, this has shape (N, ), where N is the batch size. Otherwise, this is a scalar.

Parameters:

substitution_cost (int) – The cost of substituting one character for another.
reduction (Optional[Literal['mean', 'sum', 'none']]) –
a method to reduce metric score over samples.
- 'mean': takes the mean over samples
- 'sum': takes the sum over samples
- None or 'none': return the score per sample
kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example::

Basic example with two strings. Going from “rain” -> “sain” -> “shin” -> “shine” takes 3 edits:

>>>>>> from torchmetrics.text import EditDistance
>>> metric = EditDistance()
>>> metric(["rain"], ["shine"])
tensor(3.)

Example::

Basic example with two strings and substitution cost of 2. Going from “rain” -> “sain” -> “shin” -> “shine” takes 3 edits, where two of them are substitutions:

>>>>>> from torchmetrics.text import EditDistance
>>> metric = EditDistance(substitution_cost=2)
>>> metric(["rain"], ["shine"])
tensor(5.)

Example::

Multiple strings example:

>>>>>> from torchmetrics.text import EditDistance
>>> metric = EditDistance(reduction=None)
>>> metric(["rain", "lnaguaeg"], ["shine", "language"])
tensor([3, 4], dtype=torch.int32)
>>> metric = EditDistance(reduction="mean")
>>> metric(["rain", "lnaguaeg"], ["shine", "language"])
tensor(3.5000)

plot(val=None, ax=None)[source]

Plot a single or multiple values from the metric.

Parameters:

val (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.
ax (Optional[Axes]) – An matplotlib axis object. If provided will add plot to that axis

Return type:

tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>>>>> # Example plotting a single value
>>> from torchmetrics.text import EditDistance
>>> metric = EditDistance()
>>> preds = ["this is the prediction", "there is an other sample"]
>>> target = ["this is the reference", "there is another one"]
>>> metric.update(preds, target)
>>> fig_, ax_ = metric.plot()

>>>>>> # Example plotting multiple values
>>> from torchmetrics.text import EditDistance
>>> metric = EditDistance()
>>> preds = ["this is the prediction", "there is an other sample"]
>>> target = ["this is the reference", "there is another one"]
>>> values = [ ]
>>> for _ in range(10):
...     values.append(metric(preds, target))
>>> fig_, ax_ = metric.plot(values)

Functional Interface

torchmetrics.functional.text.edit_distance(preds, target, substitution_cost=1, reduction='mean')[source]

Calculates the Levenshtein edit distance between two sequences.

The edit distance is the number of characters that need to be substituted, inserted, or deleted, to transform the predicted text into the reference text. The lower the distance, the more accurate the model is considered to be.

Implementation is similar to nltk.edit_distance.

Parameters:

preds (Union[str, Sequence[str]]) – An iterable of predicted texts (strings).
target (Union[str, Sequence[str]]) – An iterable of reference texts (strings).
substitution_cost (int) – The cost of substituting one character for another.
reduction (Optional[Literal['sum', 'mean', 'none']]) –
a method to reduce metric score over samples.
- 'mean': takes the mean over samples
- 'sum': takes the sum over samples
- None or 'none': return the score per sample

Raises:

ValueError – If preds and target do not have the same length.
ValueError – If preds or target contain non-string values.

Return type:

Tensor

Example::

Basic example with two strings. Going from “rain” -> “sain” -> “shin” -> “shine” takes 3 edits:

>>>>>> from torchmetrics.functional.text import edit_distance
>>> edit_distance(["rain"], ["shine"])
tensor(3.)

Example::

Basic example with two strings and substitution cost of 2. Going from “rain” -> “sain” -> “shin” -> “shine” takes 3 edits, where two of them are substitutions:

>>>>>> from torchmetrics.functional.text import edit_distance
>>> edit_distance(["rain"], ["shine"], substitution_cost=2)
tensor(5.)

Example::

Multiple strings example:

>>>>>> from torchmetrics.functional.text import edit_distance
>>> edit_distance(["rain", "lnaguaeg"], ["shine", "language"], reduction=None)
tensor([3, 4], dtype=torch.int32)
>>> edit_distance(["rain", "lnaguaeg"], ["shine", "language"], reduction="mean")
tensor(3.5000)