Error Logging DDP Trainer metrics in a remote function

Hi,
I’m running a DDP training loop inside a remote function. I use Pyro4 to communicate between the calling object and remote function. I use ddp_spwan as suggested. My expectation is to be able to send the training metrics via the callback function. The Pyro4 code is stable and works with CPUs, single GPU and if a dummy result is sent.

However when I try to log trainer metrics with self.log or self.log_dict while training with multiple GPUs, the callback fails to trigger correctly. The code trains fine without logging statements.

Is there a workaround for this? I need to be able to send the logging metrics. I also tried using the wandb and tensorboard libraries (not the in-built Loggers) and that throws an error during pickling.

Any suggestions would help
Thank you! :slight_smile: