Test results not logged to wandb after changing test_epoch_end to on_test_epoch_end

Hello,

I’ve cloned this repository, that uses pytorch_lightning==1.7.7.

When trying to update to pytorch_lightning to 2.2.4, the only code breaking issue was the use of the test_epoch_end hook:

def test_epoch_end(self, outputs):
    unmerged_metrics = {}
    for metrics in outputs:
        for k, v in metrics.items():
            if k not in unmerged_metrics:
                unmerged_metrics[k] = []
            unmerged_metrics[k].append(v)

    merged_metrics = {}
    for k, v in unmerged_metrics.items():
        merged_metrics[k] = float(np.mean(v))
    self.logger.log_metrics(merged_metrics, step=self.global_step)

Following this PR mentioned in the documentation, I’ve switched the test_epoch_end to:

def on_test_epoch_end(self):
    unmerged_metrics = {}
    for metrics in self.outputs:
        for k, v in metrics.items():
            if k not in unmerged_metrics:
                unmerged_metrics[k] = []
            unmerged_metrics[k].append(v)

    merged_metrics = {}
    for k, v in unmerged_metrics.items():
        merged_metrics[k] = float(np.mean(v))
    self.logger.log_metrics(merged_metrics, step=self.global_step)
    self.outputs.clear()

adding the outputs list to the parent class of TSPModel:

class COMetaModel(pl.LightningModule): # Parent class of TSPModel
    def __init__(self, param_args, node_feature_only=False):
        super(COMetaModel, self).__init__()
        # ... Initialization code ...
        self.outputs = []

The constructor of TSPModel:

class TSPModel(COMetaModel):
    def __init__(self, param_args=None):
        super(TSPModel, self).__init__(param_args=param_args, node_feature_only=False) 
        # ...initialization code....

The original test_step of the TSPModel logs the following info:

def test_step(self, batch, batch_idx, split='test'):
    # test code ...
    metrics = {
        f"{split}/gt_cost": gt_cost,
        f"{split}/2opt_iterations": ns,
        f"{split}/merge_iterations": merge_iterations,
    }
   for k, v in metrics.items():
      self.log(k, v, on_epoch=True, sync_dist=True)
    self.log(f"{split}/solved_cost", best_solved_cost, prog_bar=True, on_epoch=True, sync_dist=True)
   return metrics

The documentation of 2.2.4 specifies that a “loss” key must be provided in the metrics, so I added one:

    metrics = {
        "loss": abs(best_solved_cost - gt_cost) / best_solved_cost,
        f"{split}/gt_cost": gt_cost,
        f"{split}/2opt_iterations": ns,
        f"{split}/merge_iterations": merge_iterations,
    }
    for k, v in metrics.items():
        self.log(k, v, prog_bar=True, on_epoch=True, sync_dist=True)
    self.log(
        f"{split}/solved_cost",
        best_solved_cost,
        prog_bar=True,
        on_epoch=True,
        sync_dist=True,
    )
    self.outputs.append(metrics)
    return metrics

When running the train/test, the test metrics don’t seem to be logged, or at least not to the wandb platform, showing only a single point for each metric:

The train and val metrics graphs make sense, so it seems the problem is specific to the test.

Why is the test not being logged/displayed? Did I do something wrong in the on_test_epoch_end hook? Am I missing some other details during the 1.7.7 to 2.2.4 migration?