Hi,
I have some code that runs different evaluation, something like this:
def on_validation_epoch_end(self):
for subset in self.validation_step_outputs:
self.length_and_dist = np.array(self.validation_step_outputs[subset])
if len(self.length_and_dist):
w_errr = self.length_and_dist.mean()
self.log(f"Error/{subset}_L", w_errr, sync_dist=True)
else:
print(f'no smaples in subset = {subset} at this GPU')
The problem is that some substes are very small and some GPUs will not have w_errr
and I get into a deadlock. How can I overcome this probolem. I can not send a 0 in case the subset is not found at the gpu.
Thanks