Mixed precision training (how to appropriately scale the manual gradient updates)

I’m working with mixed precision training. My loss has conceptually two components: loss1 and loss2. I call

self.manual_backward(loss1,retain_graph=True).

This fills gradients to all params. For loss2, I compute the gradients manually using torch.autograd.grad(). The problem is that self.manual_backward() does some scaling to gradients. This makes the magnitude of gradients computed by me way smaller. I also want to do appropriate scaling on my manually computed gradients. I’m looking for a function which will do the scaling. I plan to update the gradient like

param.grad += scaled_gradient