as what is said in the above picture, validation and test log can use
sync_dist=True
.
I wonder whether there is a solution for training to synchoronize? like the following code, I run in 8 gpus, I want 8 gpus’s train_loss and train_acc to be averaged :
def training_step(self, batch, batch_idx):
inputs = self.train_inputs(batch)
loss, logits = self(**inputs)
mask = (batch['labels'] != 5).long()
ntotal = mask.sum()
ncorrect = ((logits.argmax(dim=-1) == batch['labels']).long() *
mask).sum()
acc = ncorrect / ntotal
self.log('train_loss', loss, on_step=True, prog_bar=True,sync_dist =True)
self.log("train_acc", acc, on_step=True, prog_bar=True,sync_dist= True)
return loss