.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gallery/text/rouge.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_gallery_text_rouge.py: ROUGE =============================== The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric used to evaluate the quality of generated text compared to a reference text. It does so by computing the overlap between two texts, for which a subsequent precision and recall value can be computed. The ROUGE score is often used in the context of generative tasks such as text summarization and machine translation. A major difference with Perplexity comes from the fact that ROUGE evaluates actual text, whereas Perplexity evaluates logits. .. GENERATED FROM PYTHON SOURCE LINES 10-11 Here's a hypothetical Python example demonstrating the usage of unigram ROUGE F-score to evaluate a generative language model: .. GENERATED FROM PYTHON SOURCE LINES 11-19 .. code-block:: Python :lineno-start: 12 from transformers import AutoTokenizer, pipeline from torchmetrics.text import ROUGEScore pipe = pipeline("text-generation", model="openai-community/gpt2") tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2") .. GENERATED FROM PYTHON SOURCE LINES 20-21 Define the prompt and target texts .. GENERATED FROM PYTHON SOURCE LINES 21-25 .. code-block:: Python :lineno-start: 22 prompt = "The quick brown fox" target_text = "The quick brown fox jumps over the lazy dog." .. GENERATED FROM PYTHON SOURCE LINES 26-27 Generate a sample text using the GPT-2 model .. GENERATED FROM PYTHON SOURCE LINES 27-33 .. code-block:: Python :lineno-start: 28 sample_text = pipe(prompt, max_length=20, do_sample=True, temperature=0.1, pad_token_id=tokenizer.eos_token_id)[0][ "generated_text" ] print(sample_text) .. rst-class:: sphx-glr-script-out .. code-block:: none The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a .. GENERATED FROM PYTHON SOURCE LINES 34-35 Calculate the ROUGE of the generated text .. GENERATED FROM PYTHON SOURCE LINES 35-39 .. code-block:: Python :lineno-start: 36 rouge = ROUGEScore() rouge(preds=[sample_text], target=[target_text]) .. rst-class:: sphx-glr-script-out .. code-block:: none {'rouge1_fmeasure': tensor(0.0410), 'rouge1_precision': tensor(0.0213), 'rouge1_recall': tensor(0.5556), 'rouge2_fmeasure': tensor(0.0165), 'rouge2_precision': tensor(0.0085), 'rouge2_recall': tensor(0.2500), 'rougeL_fmeasure': tensor(0.0410), 'rougeL_precision': tensor(0.0213), 'rougeL_recall': tensor(0.5556), 'rougeLsum_fmeasure': tensor(0.0328), 'rougeLsum_precision': tensor(0.0170), 'rougeLsum_recall': tensor(0.4444)} .. GENERATED FROM PYTHON SOURCE LINES 40-41 By default, the ROUGE score is calculated using a whitespace tokenizer. You can also calculate the ROUGE for the tokens directly: .. GENERATED FROM PYTHON SOURCE LINES 41-44 .. code-block:: Python :lineno-start: 41 token_rouge = ROUGEScore(tokenizer=lambda text: tokenizer.tokenize(text)) token_rouge(preds=[sample_text], target=[target_text]) .. rst-class:: sphx-glr-script-out .. code-block:: none {'rouge1_fmeasure': tensor(0.0467), 'rouge1_precision': tensor(0.0243), 'rouge1_recall': tensor(0.6000), 'rouge2_fmeasure': tensor(0.0235), 'rouge2_precision': tensor(0.0122), 'rouge2_recall': tensor(0.3333), 'rougeL_fmeasure': tensor(0.0467), 'rougeL_precision': tensor(0.0243), 'rougeL_recall': tensor(0.6000), 'rougeLsum_fmeasure': tensor(0.0448), 'rougeLsum_precision': tensor(0.0233), 'rougeLsum_recall': tensor(0.6000)} .. GENERATED FROM PYTHON SOURCE LINES 45-46 Since ROUGE is a text-based metric, it can be used to benchmark decoding strategies. For example, you can compare temperature settings: .. GENERATED FROM PYTHON SOURCE LINES 46-67 .. code-block:: Python :lineno-start: 47 import matplotlib.pyplot as plt # noqa: E402 temperatures = [x * 0.1 for x in range(1, 10)] # Generate temperature values from 0 to 1 with a step of 0.1 n_samples = 100 # Note that a real benchmark typically requires more data average_scores = [] for temperature in temperatures: sample_text = pipe( prompt, max_length=20, do_sample=True, temperature=temperature, pad_token_id=tokenizer.eos_token_id )[0]["generated_text"] scores = [rouge(preds=[sample_text], target=[target_text])["rouge1_fmeasure"] for _ in range(n_samples)] average_scores.append(sum(scores) / n_samples) # Plot the average ROUGE score for each temperature plt.plot(temperatures, average_scores) plt.xlabel("Generation temperature") plt.ylabel("Average unigram ROUGE F-Score") plt.title("ROUGE for varying temperature settings") plt.show() .. image-sg:: /gallery/text/images/sphx_glr_rouge_001.png :alt: ROUGE for varying temperature settings :srcset: /gallery/text/images/sphx_glr_rouge_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (2 minutes 3.700 seconds) .. _sphx_glr_download_gallery_text_rouge.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: rouge.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: rouge.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: rouge.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_