.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gallery/text/rouge.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_gallery_text_rouge.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gallery_text_rouge.py:

ROUGE
===============================

The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric used to evaluate the quality of generated text compared to a reference text. It does so by computing the overlap between two texts, for which a subsequent precision and recall value can be computed. The ROUGE score is often used in the context of generative tasks such as text summarization and machine translation.

A major difference with Perplexity comes from the fact that ROUGE evaluates actual text, whereas Perplexity evaluates logits.

.. GENERATED FROM PYTHON SOURCE LINES 10-11

Here's a hypothetical Python example demonstrating the usage of unigram ROUGE F-score to evaluate a generative language model:

.. GENERATED FROM PYTHON SOURCE LINES 11-19

.. code-block:: Python
   :lineno-start: 12


    from transformers import AutoTokenizer, pipeline

    from torchmetrics.text import ROUGEScore

    pipe = pipeline("text-generation", model="openai-community/gpt2")
    tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")


.. GENERATED FROM PYTHON SOURCE LINES 20-21

Define the prompt and target texts

.. GENERATED FROM PYTHON SOURCE LINES 21-25

.. code-block:: Python
   :lineno-start: 22


    prompt = "The quick brown fox"
    target_text = "The quick brown fox jumps over the lazy dog."


.. GENERATED FROM PYTHON SOURCE LINES 26-27

Generate a sample text using the GPT-2 model

.. GENERATED FROM PYTHON SOURCE LINES 27-33

.. code-block:: Python
   :lineno-start: 28


    sample_text = pipe(prompt, max_length=20, do_sample=True, temperature=0.1, pad_token_id=tokenizer.eos_token_id)[0][
        "generated_text"
    ]
    print(sample_text)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    The quick brown foxes are a great way to get a little bit of a kick out of your dog.

    The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a kick out of your dog. The quick brown foxes are a great way to get a little bit of a


.. GENERATED FROM PYTHON SOURCE LINES 34-35

Calculate the ROUGE of the generated text

.. GENERATED FROM PYTHON SOURCE LINES 35-39

.. code-block:: Python
   :lineno-start: 36


    rouge = ROUGEScore()
    rouge(preds=[sample_text], target=[target_text])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {'rouge1_fmeasure': tensor(0.0410), 'rouge1_precision': tensor(0.0213), 'rouge1_recall': tensor(0.5556), 'rouge2_fmeasure': tensor(0.0165), 'rouge2_precision': tensor(0.0085), 'rouge2_recall': tensor(0.2500), 'rougeL_fmeasure': tensor(0.0410), 'rougeL_precision': tensor(0.0213), 'rougeL_recall': tensor(0.5556), 'rougeLsum_fmeasure': tensor(0.0328), 'rougeLsum_precision': tensor(0.0170), 'rougeLsum_recall': tensor(0.4444)}


.. GENERATED FROM PYTHON SOURCE LINES 40-41

By default, the ROUGE score is calculated using a whitespace tokenizer. You can also calculate the ROUGE for the tokens directly:

.. GENERATED FROM PYTHON SOURCE LINES 41-44

.. code-block:: Python
   :lineno-start: 41

    token_rouge = ROUGEScore(tokenizer=lambda text: tokenizer.tokenize(text))
    token_rouge(preds=[sample_text], target=[target_text])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    {'rouge1_fmeasure': tensor(0.0467), 'rouge1_precision': tensor(0.0243), 'rouge1_recall': tensor(0.6000), 'rouge2_fmeasure': tensor(0.0235), 'rouge2_precision': tensor(0.0122), 'rouge2_recall': tensor(0.3333), 'rougeL_fmeasure': tensor(0.0467), 'rougeL_precision': tensor(0.0243), 'rougeL_recall': tensor(0.6000), 'rougeLsum_fmeasure': tensor(0.0448), 'rougeLsum_precision': tensor(0.0233), 'rougeLsum_recall': tensor(0.6000)}


.. GENERATED FROM PYTHON SOURCE LINES 45-46

Since ROUGE is a text-based metric, it can be used to benchmark decoding strategies. For example, you can compare temperature settings:

.. GENERATED FROM PYTHON SOURCE LINES 46-67

.. code-block:: Python
   :lineno-start: 47


    import matplotlib.pyplot as plt  # noqa: E402

    temperatures = [x * 0.1 for x in range(1, 10)]  # Generate temperature values from 0 to 1 with a step of 0.1
    n_samples = 100  # Note that a real benchmark typically requires more data

    average_scores = []

    for temperature in temperatures:
        sample_text = pipe(
            prompt, max_length=20, do_sample=True, temperature=temperature, pad_token_id=tokenizer.eos_token_id
        )[0]["generated_text"]
        scores = [rouge(preds=[sample_text], target=[target_text])["rouge1_fmeasure"] for _ in range(n_samples)]
        average_scores.append(sum(scores) / n_samples)

    # Plot the average ROUGE score for each temperature
    plt.plot(temperatures, average_scores)
    plt.xlabel("Generation temperature")
    plt.ylabel("Average unigram ROUGE F-Score")
    plt.title("ROUGE for varying temperature settings")
    plt.show()


.. image-sg:: /gallery/text/images/sphx_glr_rouge_001.png
   :alt: ROUGE for varying temperature settings
   :srcset: /gallery/text/images/sphx_glr_rouge_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (2 minutes 3.700 seconds)


.. _sphx_glr_download_gallery_text_rouge.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: rouge.ipynb <rouge.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: rouge.py <rouge.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: rouge.zip <rouge.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_