How to switch from optimizer during training

The way you are using the LBFGS is correct. I don’t think there is a problem with that. Also in the case of MNIST, I just checked whether weights were updating or not although it isn’t converging as compared to Adam. Maybe you can try it with native pytorch code and see if it converges there or not. If it does then probably there might be a bug here that needs to be fixed.