Hello,
I’m trying to run pytorch lightning (0.8.5) with horovod in a multi-gpu machine.
the issue i’m facing is that rank_zero_only.rank is always zero on each thread (4 gpus machine).
By inspecting the environment, i saw that the 4 threads do not set a LOCAL_RANK environment variable, but instead they have OMPI_COMM_WORLD_LOCAL_RANK (0 to 3).
Is that the cause of rank_zero_only not working? where should the LOCAL_RANK env var come from?
Thanks,
Stefano