Hello, I’m currently using WanDB as a logger. As I observing the logs, I noticed that the GPU Utilization is such small, and I don’t know this is a Colab issue or not and can it be solved?
My settings in trainer is simple: gpus=1, precision=16
Thank you!
Have you tried increasing your batch size?
I’ve been trying to use varies of batch_size last few days. The only thing changes that the GPU’s memory allocated, the utilization still <10%.
What kind of data do you use? Can you try to increase your number of workers? How’s your CPU utilisation?
I’m sorry, I’ve been sick these couple weeks, my CPU util is always ~99% since I have albumentations running to data aug , num_workers I’ve tried to use more but it doesn’t seem to make a change as much to the GPU
The processing on the CPU is clearly a bottleneck here.
If you do many expensive data augmentations, and your code is optimized already, you have the options:
- move data augmentation to gpu, only use cpu workers to load raw data (can be tricky depending on what you do) and is not guaranteed to give speedup
- preprocess your data entirely offline
- buy a desktop/server with more cpu cores
2 Likes
Have you tried without wandb?
I can’t be sure where was the problem but I have experience that I got 2x more utilization after remove the use of wandb.
1 Like
I haven’t checked it actually but the time to train an epoch is equivalent when I’m using wandb and when I’m not using it .