:orphan:

.. _gpu_basic:

GPU training (Basic)
====================
**Audience:** Users looking to save money and run large models faster using single or multiple

----

What is a GPU?
--------------
A Graphics Processing Unit (GPU), is a specialized hardware accelerator designed to speed up mathematical computations used in gaming and deep learning.

----

.. _multi_gpu:

Train on GPUs
-------------

The Trainer will run on all available GPUs by default. Make sure you're running on a machine with at least one GPU.
There's no need to specify any NVIDIA flags as Lightning will do it for you.

.. code-block:: python

    # run on as many GPUs as available by default
    trainer = Trainer(accelerator="auto", devices="auto", strategy="auto")
    # equivalent to
    trainer = Trainer()

    # run on one GPU
    trainer = Trainer(accelerator="gpu", devices=1)
    # run on multiple GPUs
    trainer = Trainer(accelerator="gpu", devices=8)
    # choose the number of devices automatically
    trainer = Trainer(accelerator="gpu", devices="auto")

.. note::
    Setting ``accelerator="gpu"`` will also automatically choose the "mps" device on Apple sillicon GPUs.
    If you want to avoid this, you can set ``accelerator="cuda"`` instead.

Choosing GPU devices
^^^^^^^^^^^^^^^^^^^^

You can select the GPU devices using ranges, a list of indices or a string containing
a comma separated list of GPU ids:

.. testsetup::

    k = 1

.. testcode::
    :skipif: torch.cuda.device_count() < 2

    # DEFAULT (int) specifies how many GPUs to use per node
    Trainer(accelerator="gpu", devices=k)

    # Above is equivalent to
    Trainer(accelerator="gpu", devices=list(range(k)))

    # Specify which GPUs to use (don't use when running on cluster)
    Trainer(accelerator="gpu", devices=[0, 1])

    # Equivalent using a string
    Trainer(accelerator="gpu", devices="0, 1")

    # To use all available GPUs put -1 or '-1'
    # equivalent to `list(range(torch.cuda.device_count())) and `"auto"`
    Trainer(accelerator="gpu", devices=-1)

The table below lists examples of possible input formats and how they are interpreted by Lightning.

+------------------+-----------+---------------------+---------------------------------+
| `devices`        | Type      | Parsed              | Meaning                         |
+==================+===========+=====================+=================================+
| 3                | int       | [0, 1, 2]           | first 3 GPUs                    |
+------------------+-----------+---------------------+---------------------------------+
| -1               | int       | [0, 1, 2, ...]      | all available GPUs              |
+------------------+-----------+---------------------+---------------------------------+
| [0]              | list      | [0]                 | GPU 0                           |
+------------------+-----------+---------------------+---------------------------------+
| [1, 3]           | list      | [1, 3]              | GPU index 1 and 3 (0-based)     |
+------------------+-----------+---------------------+---------------------------------+
| "3"              | str       | [0, 1, 2]           | first 3 GPUs                    |
+------------------+-----------+---------------------+---------------------------------+
| "1, 3"           | str       | [1, 3]              | GPU index 1 and 3 (0-based)     |
+------------------+-----------+---------------------+---------------------------------+
| "-1"             | str       | [0, 1, 2, ...]      | all available GPUs              |
+------------------+-----------+---------------------+---------------------------------+


Find usable CUDA devices
^^^^^^^^^^^^^^^^^^^^^^^^

If you want to run several experiments at the same time on your machine, for example for a hyperparameter sweep, then you can
use the following utility function to pick GPU indices that are "accessible", without having to change your code every time.

.. code-block:: python

    from lightning.pytorch.accelerators import find_usable_cuda_devices

    # Find two GPUs on the system that are not already occupied
    trainer = Trainer(accelerator="cuda", devices=find_usable_cuda_devices(2))

    from lightning.fabric.accelerators import find_usable_cuda_devices

    # Works with Fabric too
    fabric = Fabric(accelerator="cuda", devices=find_usable_cuda_devices(2))


This is especially useful when GPUs are configured to be in "exclusive compute mode", such that only one process at a time is allowed access to the device.
This special mode is often enabled on server GPUs or systems shared among multiple users.