.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "gallery/text/perplexity.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_gallery_text_perplexity.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_gallery_text_perplexity.py:


Perplexity
===============================

Perplexity is a measure of how well a probabilistic model predicts a sample.

In the context of language modeling, perplexity equals the exponential of the cross-entropy loss. A lower perplexity score indicates that the model is more certain about its predictions.
Since Perplexity measures token probabilities, it is not suitable for evaluating decoding tasks like text generation or machine translation. Instead, it is commonly used to evaluate the logits of generative language models.

.. GENERATED FROM PYTHON SOURCE LINES 12-13

Here's a hypothetical Python example demonstrating the usage of Perplexity to evaluate a generative language model

.. GENERATED FROM PYTHON SOURCE LINES 13-19

.. code-block:: Python
   :lineno-start: 14


    import torch
    from transformers import AutoModelWithLMHead, AutoTokenizer

    from torchmetrics.text import Perplexity


.. GENERATED FROM PYTHON SOURCE LINES 20-21

Load the GPT-2 model and tokenizer

.. GENERATED FROM PYTHON SOURCE LINES 21-25

.. code-block:: Python
   :lineno-start: 22


    model = AutoModelWithLMHead.from_pretrained("gpt2")
    tokenizer = AutoTokenizer.from_pretrained("gpt2")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    /opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/transformers/models/auto/modeling_auto.py:2178: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
      warnings.warn(


.. GENERATED FROM PYTHON SOURCE LINES 26-27

Generate token logits for a sample text

.. GENERATED FROM PYTHON SOURCE LINES 27-35

.. code-block:: Python
   :lineno-start: 28


    sample_text = "The quick brown fox jumps over the lazy dog"
    sample_input_ids = tokenizer.encode(sample_text, return_tensors="pt")

    with torch.no_grad():
        sample_outputs = model(sample_input_ids, labels=sample_input_ids)
    logits = sample_outputs.logits


.. GENERATED FROM PYTHON SOURCE LINES 36-37

We can now calculate the perplexity of the logits

.. GENERATED FROM PYTHON SOURCE LINES 37-42

.. code-block:: Python
   :lineno-start: 38


    perplexity = Perplexity()
    score = perplexity(preds=logits, target=sample_input_ids)
    print(f"Perplexity, unshifted: {score.item()}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Perplexity, unshifted: 1929.9822998046875


.. GENERATED FROM PYTHON SOURCE LINES 43-44

This perplexity score is suspiciously high. The cause of this is that the model labels need to be shifted to the right by one position. We can fix this by removing the first token from the logits and the last token from the target

.. GENERATED FROM PYTHON SOURCE LINES 44-48

.. code-block:: Python
   :lineno-start: 45


    score = perplexity(preds=logits[:, :-1], target=sample_input_ids[:, 1:])
    print(f"Perplexity, shifted: {score.item()}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Perplexity, shifted: 227.27783203125


.. GENERATED FROM PYTHON SOURCE LINES 49-50

Since the perplexity equates to the exponential of the cross-entropy loss, we can verify the perplexity calculation by comparing it to the loss

.. GENERATED FROM PYTHON SOURCE LINES 50-55

.. code-block:: Python
   :lineno-start: 51


    cross_entropy = score
    perplexity = sample_outputs.loss.exp()
    print(torch.allclose(perplexity, cross_entropy))


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    True


.. GENERATED FROM PYTHON SOURCE LINES 56-57

Be aware that sequences are often padded to ensure equal length. In such cases, the padding tokens should be ignored when calculating the perplexity. This can be achieved by specifying the `ignore_index` argument in the `Perplexity` metric

.. GENERATED FROM PYTHON SOURCE LINES 57-71

.. code-block:: Python
   :lineno-start: 58


    tokenizer.pad_token_id = tokenizer.eos_token_id
    sample_input_ids = tokenizer.encode(sample_text, return_tensors="pt", padding="max_length", max_length=20)
    with torch.no_grad():
        sample_outputs = model(sample_input_ids, labels=sample_input_ids)
    logits = sample_outputs.logits

    perplexity = Perplexity(ignore_index=None)
    score = perplexity(preds=logits[:, :-1], target=sample_input_ids[:, 1:])
    print(f"Perplexity, including padding: {score.item()}")

    perplexity = Perplexity(ignore_index=tokenizer.pad_token_id)
    score = perplexity(preds=logits[:, :-1], target=sample_input_ids[:, 1:])
    print(f"Perplexity, ignoring padding: {score.item()}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Perplexity, including padding: 24400.68359375
    Perplexity, ignoring padding: 227.27783203125


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 2.401 seconds)


.. _sphx_glr_download_gallery_text_perplexity.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: perplexity.ipynb <perplexity.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: perplexity.py <perplexity.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: perplexity.zip <perplexity.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_