Memory leak after the first validation epoch

tunguskamed · January 2, 2024, 7:32pm

I’m facing a very annoying problem with Pytorch Lightning. I’ve been struggling with this for two days and I can’t see a way out. I’ve read all possible posts but couldn’t find a solution,therefore I’m hoping in your precious help.

The problem I’m facing is the following: by using “free -h” command, I see used memory increasing of 0.4 GB after the first validation epoch, right after that the logging has been printed to the console. This memory is not released after the program terminates and an out of memory error occurs after many training attempts.

I have no insight on how to fix this. I tried commenting the logging process, tried various memory trackers (like tracemalloc) but had no clear clue on what’s going on.

You can find the code GitHub - TunguskaMed/ML_EX2 . Unfortunately I haven’t written a readme yet, but from the main.py you can launch a training with hyperparameters combinations set in utils.py. Inputs are 96x96 colored images.

Thanks in advace to everyone who will try to help me! It is important as it is for an university project.

Topic		Replies	Views
Saving extra memory consumption because of CUDA Memory issue after a few epochs	0	534	March 13, 2024
GPU memory surge after training epochs causing CUDA memory error Trainer	0	2432	August 23, 2021
My Training Loss and Validation loss are correct but my validation loss is exploding implementation help	6	4639	October 30, 2023
RAM usage increases quickly over the training step implementation help	2	499	March 30, 2023
Does not run validation step after epoch when running with all data implementation help	5	2675	May 1, 2023

Memory leak after the first validation epoch

Related topics