Extended Edit Distance

Module Interface

class torchmetrics.text.ExtendedEditDistance(language='en', return_sentence_level_score=False, alpha=2.0, rho=0.3, deletion=0.2, insertion=1.0, **kwargs)[source]

Compute extended edit distance score (ExtendedEditDistance) for strings or list of strings.

The metric utilises the Levenshtein distance and extends it by adding a jump operation.

As input to forward and update the metric accepts the following input:

preds (Sequence): An iterable of hypothesis corpus
target (Sequence): An iterable of iterables of reference corpus

As output of forward and compute the metric returns the following output:

eed (Tensor): A tensor with the extended edit distance score

Parameters:

language (Literal['en', 'ja']) – Language used in sentences. Only supports English (en) and Japanese (ja) for now.
return_sentence_level_score (bool) – An indication of whether sentence-level EED score is to be returned
alpha (float) – optimal jump penalty, penalty for jumps between characters
rho (float) – coverage cost, penalty for repetition of characters
deletion (float) – penalty for deletion of character
insertion (float) – penalty for insertion or substitution of character
kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>>>>> from torchmetrics.text import ExtendedEditDistance
>>> preds = ["this is the prediction", "here is an other sample"]
>>> target = ["this is the reference", "here is another one"]
>>> eed = ExtendedEditDistance()
>>> eed(preds=preds, target=target)
tensor(0.3078)

plot(val=None, ax=None)[source]

Plot a single or multiple values from the metric.

Parameters:

val (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.
ax (Optional[Axes]) – An matplotlib axis object. If provided will add plot to that axis

Return type:

tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>>>>> # Example plotting a single value
>>> from torchmetrics.text import ExtendedEditDistance
>>> metric = ExtendedEditDistance()
>>> preds = ["this is the prediction", "there is an other sample"]
>>> target = ["this is the reference", "there is another one"]
>>> metric.update(preds, target)
>>> fig_, ax_ = metric.plot()

>>>>>> # Example plotting multiple values
>>> from torchmetrics.text import ExtendedEditDistance
>>> metric = ExtendedEditDistance()
>>> preds = ["this is the prediction", "there is an other sample"]
>>> target = ["this is the reference", "there is another one"]
>>> values = [ ]
>>> for _ in range(10):
...     values.append(metric(preds, target))
>>> fig_, ax_ = metric.plot(values)

Functional Interface

torchmetrics.functional.text.extended_edit_distance(preds, target, language='en', return_sentence_level_score=False, alpha=2.0, rho=0.3, deletion=0.2, insertion=1.0)[source]

Compute extended edit distance score (ExtendedEditDistance) [1] for strings or list of strings.

The metric utilises the Levenshtein distance and extends it by adding a jump operation.

Parameters:

preds (Union[str, Sequence[str]]) – An iterable of hypothesis corpus.
target (Sequence[Union[str, Sequence[str]]]) – An iterable of iterables of reference corpus.
language (Literal['en', 'ja']) – Language used in sentences. Only supports English (en) and Japanese (ja) for now. Defaults to en
return_sentence_level_score (bool) – An indication of whether sentence-level EED score is to be returned.
alpha (float) – optimal jump penalty, penalty for jumps between characters
rho (float) – coverage cost, penalty for repetition of characters
deletion (float) – penalty for deletion of character
insertion (float) – penalty for insertion or substitution of character

Return type:

Union[Tensor, tuple[Tensor, Tensor]]

Returns:

Extended edit distance score as a tensor

Example

>>>>>> from torchmetrics.functional.text import extended_edit_distance
>>> preds = ["this is the prediction", "here is an other sample"]
>>> target = ["this is the reference", "here is another one"]
>>> extended_edit_distance(preds=preds, target=target)
tensor(0.3078)

References

[1] P. Stanchev, W. Wang, and H. Ney, “EED: Extended Edit Distance Measure for Machine Translation”, submitted to WMT 2019. ExtendedEditDistance