Ignore log in one of the GPUs as it does not have a specific loss

orena1 · October 22, 2023, 4:27pm

Hi,
I have some code that runs different evaluation, something like this:

    def on_validation_epoch_end(self):
        for subset in self.validation_step_outputs:
            self.length_and_dist = np.array(self.validation_step_outputs[subset])
            if len(self.length_and_dist):
                w_errr = self.length_and_dist.mean()
                self.log(f"Error/{subset}_L", w_errr, sync_dist=True)
            else:
                print(f'no smaples in subset = {subset} at this GPU')

The problem is that some substes are very small and some GPUs will not have w_errr and I get into a deadlock. How can I overcome this probolem. I can not send a 0 in case the subset is not found at the gpu.

Thanks

awaelchli · October 24, 2023, 10:36pm

I think you answered your question yourself: “send a 0 in case the subset is not found”.

 if len(self.length_and_dist):
    w_errr = self.length_and_dist.mean()
else:
    w_err = 0

self.log(f"Error/{subset}_L", w_err, sync_dist=True)

orena1 · October 24, 2023, 10:54pm

Thanks but I think that this will assume that the loss/error in one of the GPUs it 0 which will be not accurate, for example:
3 GPUs, the error in GPUs 1,2 is 0.7 and 0.8 and GPU 3 will not have any of these samples, so if it sends zero the mean of those samples will be not accurate, no ? or maybe I am missing something

Topic		Replies	Views
Proper way to log things when using DDP	0	2204	March 12, 2021
Logger in Lightning	0	190	March 14, 2022
Rank_zero_only Callback in ddp DDP/GPU	2	2671	January 30, 2023
The validating log does not remain in the console implementation help	0	671	June 17, 2021
Try... except statement with DDPSpawn DDP/GPU	2	487	February 24, 2023

Ignore log in one of the GPUs as it does not have a specific loss

Related topics