RAM Held by workers after validation

Anjum_Sayed · March 10, 2023, 1:47pm

I’m training using DDP on 2 GPUs with 8 workers in each dataloader (using a PL datamodule)

After a validation loop, I can see 16 workers with 0% CPU util sat on a bunch of RAM. Is this normal and if not, how should I go about releasing this memory?

Here’s how my loaders are set up:

awaelchli · March 10, 2023, 2:06pm

Hey, according to the PyTorch docs for DataLoader:

persistent_workers: If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)

The default in Lightning is also False, but you are forcing it to True. So I think it is expected that the workers remain (until the next validation phase where they will do work again).

Topic		Replies	Views
Crash if numworkers>0	2	1499	May 25, 2023
How to not load complete in-memory dataset for every process in DDP training DDP/GPU	2	3937	October 17, 2023
Stucks on 8gpu training setting	2	2210	February 25, 2021
Best practises for implementing large datasets with DDP DDP/GPU	0	771	December 12, 2021
Get the indices of Dataloader for multi-gpu training DDP/GPU	0	443	December 1, 2023

RAM Held by workers after validation

Related topics