Performance Drop in PL compared to Pytorch

Hi there,

I have moved my source from pytorch to PL but cannot reproduce the result.
I have checked the data and pretrained model’s initial weights.
(slight difference in last decimal numbers in data, same initial weights)

Though the results cannot be exactly the same, the performance drop is more than 20% which doesn’t makes sense to me.

When i logged it to wandb, it seems that the training results are similar.
However, the loss in validation that used to drop in pytorch, goes up in PL.

Any help?

  • Pytorch Version : 2.0.1

  • PL Version : 2.0.6

  • Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

Thanks

Do you have a sample script that can be used to reproduce this? Also, I’d recommend upgrade to PL 2.1 and torch 2.1.