Can't train with a too old NVIDIA driver (even with CPU accelerator)

Hi,

My NVIDIA GPU is too old.
GeForce GT 620 with this version of driver

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  390.157  Wed Oct 12 09:19:07 UTC 2022
GCC version:

So only want to use CPU in this example.
I add accelerator="cpu" to force the trainer NOT to use GPU but CPU instead :
trainer = L.Trainer(accelerator="cpu", limit_train_batches=100, max_epochs=1)

But I still get this error :

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/pytorch/trainer/setup.py:187: GPU available but not used. You can set it by doing `Trainer(accelerator='gpu')`.

  | Name    | Type       | Params
---------------------------------------
0 | encoder | Sequential | 50.4 K
1 | decoder | Sequential | 51.2 K
---------------------------------------
101 K     Trainable params
0         Non-trainable params
101 K     Total params
0.407     Total estimated model params size (MB)
Traceback (most recent call last):
  File "/home/phil/Test/Lightning/lit_auto_encoder.py", line 45, in <module>
    trainer.fit(model=autoencoder, train_dataloaders=train_loader)
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1032, in _run_stage
    with isolate_rng():
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/pytorch/utilities/seed.py", line 43, in isolate_rng
    states = _collect_rng_states(include_cuda)
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/lightning/fabric/utilities/seed.py", line 118, in _collect_rng_states
    states["torch.cuda"] = torch.cuda.get_rng_state_all()
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/torch/cuda/random.py", line 47, in get_rng_state_all
    results.append(get_rng_state(i))
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/torch/cuda/random.py", line 30, in get_rng_state
    _lazy_init()
  File "/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

How can I do to solve this probleme?

Philippe

Hey Philippe

Just a quick question, what does this return on your system?

print(torch.cuda.is_available())

Or does it also error out?

Hi,

(sorry for the delay due to holidays)

Here’s the output.

>>> print(torch.cuda.is_available())
/home/phil/.virtualenvs/lightning/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

Thanks. I think in this case we can fix this in Lightning to avoid accessing the torch.cuda.get_rng_state_all. There is no easy fix for you to do directly on your side though I think. You could try doing

export CUDA_VISIBLE_DEVICES=""

in your terminal before you launch your script. But this might not work (I can’t really test it myself).

+Perfect, it works :grinning: