Trouble loading checkpoints (?)

After further investigations, I don’t think the issue is with loading the weights.

This is even more confusing, as I’m using the same exact evaluation metrics during training and to evaluate the final predictions. Maybe the error is in how I compute the predictions (?).