Gradient Manipulation for Multitasking

haldunbalim · April 19, 2022, 10:36pm

Hello,

I would like to implement [1812.02224] Adapting Auxiliary Losses Using Gradient Similarity. Where I have a main loss and multiple different auxiliary losses and would pass these auxiliary losses to optimizer if the gradients have cosine similarity over zero with the main task loss. For this I calculate gradients with respect to each loss using torch.autograd.grad and calculate cosine similarities and add only the selected task losses to optimized loss. However, since I don’t know how to pass these gradients to optimizer I calculate the backwards two times every step. I would like to learn how to implement this in an effective way.

Thanks a lot.

Topic		Replies	Views
Mixed precision training (how to appropriately scale the manual gradient updates) LightningModule	0	307	December 5, 2023
How to do gradient descent inside a pl model implementation help	18	3971	October 22, 2020
Accumulate_grad_batches and learning rate Trainer	1	747	October 14, 2023
Implement multiple losses in PL implementation help	1	2514	August 28, 2020
Implementing Gradient Skipping implementation help	7	3996	January 11, 2021

Gradient Manipulation for Multitasking

Related topics