GPU training (Basic)¶
Audience: Users looking to save money and run large models faster using single or multiple
What is a GPU?¶
A Graphics Processing Unit (GPU), is a specialized hardware accelerator designed to speed up mathematical computations used in gaming and deep learning.
Train on 1 GPU¶
Make sure you’re running on a machine with at least one GPU. There’s no need to specify any NVIDIA flags as Lightning will do it for you.
trainer = Trainer(accelerator="gpu", devices=1)
Train on multiple GPUs¶
To use multiple GPUs, set the number of devices in the Trainer or the index of the GPUs.
trainer = Trainer(accelerator="gpu", devices=4)
Choosing GPU devices¶
You can select the GPU devices using ranges, a list of indices or a string containing a comma separated list of GPU ids:
# DEFAULT (int) specifies how many GPUs to use per node
Trainer(accelerator="gpu", devices=k)
# Above is equivalent to
Trainer(accelerator="gpu", devices=list(range(k)))
# Specify which GPUs to use (don't use when running on cluster)
Trainer(accelerator="gpu", devices=[0, 1])
# Equivalent using a string
Trainer(accelerator="gpu", devices="0, 1")
# To use all available GPUs put -1 or '-1'
# equivalent to list(range(torch.cuda.device_count()))
Trainer(accelerator="gpu", devices=-1)
The table below lists examples of possible input formats and how they are interpreted by Lightning.
devices |
Type |
Parsed |
Meaning |
---|---|---|---|
3 |
int |
[0, 1, 2] |
first 3 GPUs |
-1 |
int |
[0, 1, 2, …] |
all available GPUs |
[0] |
list |
[0] |
GPU 0 |
[1, 3] |
list |
[1, 3] |
GPUs 1 and 3 |
“3” |
str |
[0, 1, 2] |
first 3 GPUs |
“1, 3” |
str |
[1, 3] |
GPUs 1 and 3 |
“-1” |
str |
[0, 1, 2, …] |
all available GPUs |
Note
When specifying number of devices
as an integer devices=k
, setting the trainer flag
auto_select_gpus=True
will automatically help you find k
GPUs that are not
occupied by other processes. This is especially useful when GPUs are configured
to be in “exclusive mode”, such that only one process at a time can access them.
For more details see the trainer guide.