ROUGE Score

Module Interface

class torchmetrics.text.rouge.ROUGEScore(use_stemmer=False, normalizer=None, tokenizer=None, accumulate='best', rouge_keys=('rouge1', 'rouge2', 'rougeL', 'rougeLsum'), **kwargs)[source]

Calculate Rouge Score, used for automatic summarization.

This implementation should imitate the behaviour of the rouge-score package Python ROUGE Implementation

As input to forward and update the metric accepts the following input:

  • preds (Sequence): An iterable of predicted sentences or a single predicted sentence

  • target (Sequence): An iterable of target sentences or an iterable of interables of target sentences or a single target sentence

As output of forward and compute the metric returns the following output:

  • rouge (Dict): A dictionary of tensor rouge scores for each input str rouge key

Parameters:
  • use_stemmer (bool) – Use Porter stemmer to strip word suffixes to improve matching.

  • normalizer (Optional[Callable[[str], str]]) – A user’s own normalizer function. If this is None, replacing any non-alpha-numeric characters with spaces is default. This function must take a str and return a str.

  • tokenizer (Optional[Callable[[str], Sequence[str]]]) – A user’s own tokenizer function. If this is None, splitting by spaces is default This function must take a str and return Sequence[str]

  • accumulate (Literal['avg', 'best']) –

    Useful in case of multi-reference rouge score.

    • avg takes the avg of all references with respect to predictions

    • best takes the best fmeasure score obtained between prediction and multiple corresponding references.

  • rouge_keys (Union[str, tuple[str, ...]]) – A list of rouge types to calculate. Keys that are allowed are rougeL, rougeLsum, and rouge1 through rouge9.

  • kwargs (Any) – Additional keyword arguments, see Advanced metric settings for more info.

Example

>>> from torchmetrics.text.rouge import ROUGEScore
>>> preds = "My name is John"
>>> target = "Is your name John"
>>> rouge = ROUGEScore()
>>> from pprint import pprint
>>> pprint(rouge(preds, target))
{'rouge1_fmeasure': tensor(0.7500),
 'rouge1_precision': tensor(0.7500),
 'rouge1_recall': tensor(0.7500),
 'rouge2_fmeasure': tensor(0.),
 'rouge2_precision': tensor(0.),
 'rouge2_recall': tensor(0.),
 'rougeL_fmeasure': tensor(0.5000),
 'rougeL_precision': tensor(0.5000),
 'rougeL_recall': tensor(0.5000),
 'rougeLsum_fmeasure': tensor(0.5000),
 'rougeLsum_precision': tensor(0.5000),
 'rougeLsum_recall': tensor(0.5000)}
Raises:
  • ValueError – If the python packages nltk is not installed.

  • ValueError – If any of the rouge_keys does not belong to the allowed set of keys.

plot(val=None, ax=None)[source]

Plot a single or multiple values from the metric.

Parameters:
  • val (Union[Tensor, Sequence[Tensor], None]) – Either a single result from calling metric.forward or metric.compute or a list of these results. If no value is provided, will automatically call metric.compute and plot that result.

  • ax (Optional[Axes]) – An matplotlib axis object. If provided will add plot to that axis

Return type:

tuple[Figure, Union[Axes, ndarray]]

Returns:

Figure and Axes object

Raises:

ModuleNotFoundError – If matplotlib is not installed

>>> # Example plotting a single value
>>> from torchmetrics.text.rouge import ROUGEScore
>>> metric = ROUGEScore()
>>> preds = "My name is John"
>>> target = "Is your name John"
>>> metric.update(preds, target)
>>> fig_, ax_ = metric.plot()
../_images/rouge_score-1.png
>>> # Example plotting multiple values
>>> from torchmetrics.text.rouge import ROUGEScore
>>> metric = ROUGEScore()
>>> preds = "My name is John"
>>> target = "Is your name John"
>>> values = [ ]
>>> for _ in range(10):
...     values.append(metric(preds, target))
>>> fig_, ax_ = metric.plot(values)
../_images/rouge_score-2.png

Functional Interface

torchmetrics.functional.text.rouge.rouge_score(preds, target, accumulate='best', use_stemmer=False, normalizer=None, tokenizer=None, rouge_keys=('rouge1', 'rouge2', 'rougeL', 'rougeLsum'))[source]

Calculate Calculate Rouge Score , used for automatic summarization.

Parameters:
  • preds (Union[str, Sequence[str]]) – An iterable of predicted sentences or a single predicted sentence.

  • target (Union[str, Sequence[str], Sequence[Sequence[str]]]) – An iterable of iterables of target sentences or an iterable of target sentences or a single target sentence.

  • accumulate (Literal['avg', 'best']) –

    Useful in case of multi-reference rouge score.

    • avg takes the avg of all references with respect to predictions

    • best takes the best fmeasure score obtained between prediction and multiple corresponding references.

  • use_stemmer (bool) – Use Porter stemmer to strip word suffixes to improve matching.

  • normalizer (Optional[Callable[[str], str]]) – A user’s own normalizer function. If this is None, replacing any non-alpha-numeric characters with spaces is default. This function must take a str and return a str.

  • tokenizer (Optional[Callable[[str], Sequence[str]]]) – A user’s own tokenizer function. If this is None, splitting by spaces is default This function must take a str and return Sequence[str]

  • rouge_keys (Union[str, tuple[str, ...]]) – A list of rouge types to calculate. Keys that are allowed are rougeL, rougeLsum, and rouge1 through rouge9.

Return type:

dict[str, Tensor]

Returns:

Python dictionary of rouge scores for each input rouge key.

Example

>>> from torchmetrics.functional.text.rouge import rouge_score
>>> preds = "My name is John"
>>> target = "Is your name John"
>>> from pprint import pprint
>>> pprint(rouge_score(preds, target))
{'rouge1_fmeasure': tensor(0.7500),
 'rouge1_precision': tensor(0.7500),
 'rouge1_recall': tensor(0.7500),
 'rouge2_fmeasure': tensor(0.),
 'rouge2_precision': tensor(0.),
 'rouge2_recall': tensor(0.),
 'rougeL_fmeasure': tensor(0.5000),
 'rougeL_precision': tensor(0.5000),
 'rougeL_recall': tensor(0.5000),
 'rougeLsum_fmeasure': tensor(0.5000),
 'rougeLsum_precision': tensor(0.5000),
 'rougeLsum_recall': tensor(0.5000)}
Raises:
  • ModuleNotFoundError – If the python package nltk is not installed.

  • ValueError – If any of the rouge_keys does not belong to the allowed set of keys.

References

[1] ROUGE: A Package for Automatic Evaluation of Summaries by Chin-Yew Lin. https://aclanthology.org/W04-1013/