I’m training a model across two GPUs on patient data (id). In my test steps, I output dictionaries, which contain the id, as well as all the metrics. I store these (a list with a dict per id) at the end of the test epoch, so I can later on statistically evaluate model performances.
I’m experiencing a problem with the test step, however.
# Test step
def test_step(self, batch, batch_idx):
# Get new input and predict, then calculate loss
x, y, id = batch["input"], batch["target"], batch["id"]
# Infer and time inference
start = time()
y_hat = self.test_inference(x, self, **self.test_inference_params)
end = time()
# Calculate metrics
id = id[0] if len(id) == 1 else tuple(id)
# Output dict with duration of inference
output = {"id": id, "time": end - start}
# Add other metrics to output dict
for m, pars in zip(self.metrics, self.metrics_params):
metric_value = m(y_hat, y, **pars)
if hasattr(metric_value, "item"):
metric_value = metric_value.item()
output[f"test_{m.__name__}"] = metric_value
return output
# Test epoch end (= test end)
def test_epoch_end(self, outputs):
# Go over outputs and gather
self.test_results = outputs #self.all_gather(outputs)
I hadn’t considered this before (as I’m used to training on a single GPU), but the test_results attribute now only contains half of the outputs (one half per process). So when my main script reaches this section, only half the output is effectively stored:
log("Evaluating model.")
trainer.test(model=model,
dataloaders=brats.val_dataloader())
results = model.test_results
# Save test results
log("Saving results.")
np.save(file=join(result_dir, f'{model_name}_v{version}_fold{fold_index}.npy'), arr=results)
I have read about the self.all_gather
method, but I’m not sure it suits my needs. I want to merge the lists, not reduce anything. Also, they’re not Tensors, but dicts. How can I store all dicts across both DDP processes?