I’m training a GPT2 network and my configure_optimizers()
is as follows:
def configure_optimizers(self):
opt = optim.Adam(self.model.parameters(), self.lr)
# logging.info()
total_steps = \
len(self.trainer.datamodule.train_dataset) * self.trainer.max_epochs \
// (self.trainer.datamodule.train_batch_size) \
// (self.trainer.num_devices * self.trainer.accumulate_grad_batches) \
Due to some issues about my training data, I interrupted training.
After resuming training (specifying ckpt_path
), I’ve found some weird issues about lr
, that is, when I call self.optimizers().param_groups[0]['lr']
to get lr, the returned value is always 0
, while I can get the appropriate lr
when calling self.lr_schedulers().optimizer.param_groups[0]['lr']
.
Through debugging, I’ve found that the former call will return a LightningAdam
instance while the latter one will return a pytorch’s Adam
instance.
So which optimizer is used to optimize the model, the optimizer derived from self.lr_schedulers().optimizer
or self.optimizers()
? And does it mean that there are some bugs about saving and resuming lr_schedulers?