I’m working with mixed precision training. My loss has conceptually two components: loss1 and loss2. I call
self.manual_backward(loss1,retain_graph=True).
This fills gradients to all params. For loss2, I compute the gradients manually using torch.autograd.grad(). The problem is that self.manual_backward() does some scaling to gradients. This makes the magnitude of gradients computed by me way smaller. I also want to do appropriate scaling on my manually computed gradients. I’m looking for a function which will do the scaling. I plan to update the gradient like
param.grad += scaled_gradient