However, if I were to extract the gradient information within the optimizer_step method, using something like this:
def optimizer_step(self, *args, **kwargs):
# All p.grad are tensors of zeros
parameters = [p for p in self.parameters() if p.grad is not None]
grad_norm = # calculate grad_norm
if grad_norm > thresh:
super().optimizer_step(*args, **kwargs)
All the parameters appear to have zero accumulated gradients. What could be happening here? Is loss.backward() not yet called at this point?
hi there! same problem, i would like to skip gradient and optimization computation, for this i use manual_optimizer and manual_backward (see this) this works but then loss is marked as “nan” in the progress bar. Anyway here is an example:
def training_step(self, batch, batch_nb):
loss_dict = self.compute_loss(...)
if loss_dict is None:
return
loss = sum([value for key, value in loss_dict.items()])
opt = self.optimizers()
self.manual_backward(loss, opt)
self.manual_optimizer_step(opt)
logs = {'loss': loss}
logs.update({'train_'+k:v.item() for k, v in loss_dict.items()})
return logs