Hello,
I’ve cloned this repository, that uses pytorch_lightning==1.7.7.
When trying to update to pytorch_lightning to 2.2.4, the only code breaking issue was the use of the test_epoch_end hook:
def test_epoch_end(self, outputs):
unmerged_metrics = {}
for metrics in outputs:
for k, v in metrics.items():
if k not in unmerged_metrics:
unmerged_metrics[k] = []
unmerged_metrics[k].append(v)
merged_metrics = {}
for k, v in unmerged_metrics.items():
merged_metrics[k] = float(np.mean(v))
self.logger.log_metrics(merged_metrics, step=self.global_step)
Following this PR mentioned in the documentation, I’ve switched the test_epoch_end to:
def on_test_epoch_end(self):
unmerged_metrics = {}
for metrics in self.outputs:
for k, v in metrics.items():
if k not in unmerged_metrics:
unmerged_metrics[k] = []
unmerged_metrics[k].append(v)
merged_metrics = {}
for k, v in unmerged_metrics.items():
merged_metrics[k] = float(np.mean(v))
self.logger.log_metrics(merged_metrics, step=self.global_step)
self.outputs.clear()
adding the outputs list to the parent class of TSPModel:
class COMetaModel(pl.LightningModule): # Parent class of TSPModel
def __init__(self, param_args, node_feature_only=False):
super(COMetaModel, self).__init__()
# ... Initialization code ...
self.outputs = []
The constructor of TSPModel:
class TSPModel(COMetaModel):
def __init__(self, param_args=None):
super(TSPModel, self).__init__(param_args=param_args, node_feature_only=False)
# ...initialization code....
The original test_step of the TSPModel logs the following info:
def test_step(self, batch, batch_idx, split='test'):
# test code ...
metrics = {
f"{split}/gt_cost": gt_cost,
f"{split}/2opt_iterations": ns,
f"{split}/merge_iterations": merge_iterations,
}
for k, v in metrics.items():
self.log(k, v, on_epoch=True, sync_dist=True)
self.log(f"{split}/solved_cost", best_solved_cost, prog_bar=True, on_epoch=True, sync_dist=True)
return metrics
The documentation of 2.2.4 specifies that a “loss” key must be provided in the metrics, so I added one:
metrics = {
"loss": abs(best_solved_cost - gt_cost) / best_solved_cost,
f"{split}/gt_cost": gt_cost,
f"{split}/2opt_iterations": ns,
f"{split}/merge_iterations": merge_iterations,
}
for k, v in metrics.items():
self.log(k, v, prog_bar=True, on_epoch=True, sync_dist=True)
self.log(
f"{split}/solved_cost",
best_solved_cost,
prog_bar=True,
on_epoch=True,
sync_dist=True,
)
self.outputs.append(metrics)
return metrics
When running the train/test, the test metrics don’t seem to be logged, or at least not to the wandb platform, showing only a single point for each metric:
The train and val metrics graphs make sense, so it seems the problem is specific to the test.
Why is the test not being logged/displayed? Did I do something wrong in the on_test_epoch_end
hook? Am I missing some other details during the 1.7.7 to 2.2.4 migration?