Strange checkpoint loading and learning and behaviour

Yes. I saw that lightning likes a slightly different terminology, but that one is personal history (scaling in there is based on data so test_loss == 1 is roughly 100% data range RMSE)

        x, y = batch
        z = self(x)

        loss = F.mse_loss(z, y)

        self.log('test_loss', loss.detach().sqrt() * (1/5e-5))

        return loss