Mixed precision training (how to appropriately scale the manual gradient updates)

ashesh-0 · December 5, 2023, 3:11pm

I’m working with mixed precision training. My loss has conceptually two components: loss1 and loss2. I call

self.manual_backward(loss1,retain_graph=True).

This fills gradients to all params. For loss2, I compute the gradients manually using torch.autograd.grad(). The problem is that self.manual_backward() does some scaling to gradients. This makes the magnitude of gradients computed by me way smaller. I also want to do appropriate scaling on my manually computed gradients. I’m looking for a function which will do the scaling. I plan to update the gradient like

param.grad += scaled_gradient

Topic		Replies	Views
Backward twice in one training_step Trainer	0	1202	June 6, 2021
Manual Backward, how to get the parameters gradients at the end of the fit method	3	4122	February 18, 2021
Is it possible to return gradient instead of loss in train_step LightningModule	0	432	July 18, 2022
Gradient Manipulation for Multitasking implementation help	0	712	April 19, 2022
Best practices for double precision training Trainer	0	86	June 8, 2024

Mixed precision training (how to appropriately scale the manual gradient updates)

Related topics