RAM Held by workers after validation

I’m training using DDP on 2 GPUs with 8 workers in each dataloader (using a PL datamodule)

After a validation loop, I can see 16 workers with 0% CPU util sat on a bunch of RAM. Is this normal and if not, how should I go about releasing this memory?

Here’s how my loaders are set up:

image

Hey, according to the PyTorch docs for DataLoader:

persistent_workers: If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)

The default in Lightning is also False, but you are forcing it to True. So I think it is expected that the workers remain (until the next validation phase where they will do work again).

1 Like