Implementing Gradient Skipping

ang_yizhe · December 1, 2020, 4:49am

Hi, I would like to implement gradient skipping in PL, i.e. skipping training updates with a gradient norm above a certain threshold.

In other words,

Calculate gradient norm of model parameters
If gradient norm > thresh, decide whether or not to call optimizer.step()

Any advice on what could be the recommended way to implement this in the LightningModule?

goku · December 1, 2020, 7:00pm

you can override optimizer_step

def optimizer_step(self, *args, **kwargs):
    grad_norm = # calculate grad_norm
    if grad_norm > thresh:
        super().optimizer_step(*args, **kwargs)

ang_yizhe · December 2, 2020, 7:24am

Thanks for your reply!

However, if I were to extract the gradient information within the optimizer_step method, using something like this:

def optimizer_step(self, *args, **kwargs):
    # All p.grad are tensors of zeros
    parameters = [p for p in self.parameters() if p.grad is not None]
    grad_norm = # calculate grad_norm

    if grad_norm > thresh:
        super().optimizer_step(*args, **kwargs)

All the parameters appear to have zero accumulated gradients. What could be happening here? Is loss.backward() not yet called at this point?

goku · December 2, 2020, 5:05pm

ok yeah…backward pass happens in the closure… need to find a better alternative.

Etienne_Perot · December 3, 2020, 8:11am

hi there! same problem, i would like to skip gradient and optimization computation, for this i use manual_optimizer and manual_backward (see this) this works but then loss is marked as “nan” in the progress bar. Anyway here is an example:

def training_step(self, batch, batch_nb):

    loss_dict = self.compute_loss(...)
    
    if loss_dict is None:
        return 

    loss = sum([value for key, value in loss_dict.items()])

    opt = self.optimizers()
    self.manual_backward(loss, opt)
    self.manual_optimizer_step(opt)

    logs = {'loss': loss}
    logs.update({'train_'+k:v.item() for k, v in loss_dict.items()})

    return logs

ang_yizhe · December 3, 2020, 8:37am

Etienne_Perot:

hi there! same problem, i would like to skip gradient and optimization computation, for this i use manual_optimizer and manual_backward (see this) this works but then loss is marked as “nan” in the progress bar. Anyway here is an example:
def training_step(self, batch, batch_nb):

    loss_dict = self.compute_loss(...)
    
    if loss_dict is None:
        return 

    loss = sum([value for key, value in loss_dict.items()])

    opt = self.optimizers()
    self.manual_backward(loss, opt)
    self.manual_optimizer_step(opt)

    logs = {'loss': loss}
    logs.update({'train_'+k:v.item() for k, v in loss_dict.items()})

    return logs

I’ve also tried this, and I get the same result that the loss is logged as “nan”. Haven’t quite figured out why this is so…

Here is a simple example in colab.

Etienne_Perot · December 12, 2020, 2:47pm

hello, i have read the solution on the github issues, for the loss being nan in the tqdm progress bar you can just do:

self.trainer.train_loop.running_loss.append(loss)

Etienne_Perot · January 11, 2021, 10:29am

Hello there, it seems manual optimization does not work with native amp in newest version. Do we need an extra-step like overriding backward method with code in Introducing native PyTorch automatic mixed precision for faster training on NVIDIA GPUs | PyTorch ?

Topic		Replies	Views
Easily skipping optimizers for modular networks implementation help	4	1100	September 7, 2020
Backward twice in one training_step Trainer	0	1202	June 6, 2021
How to do gradient descent inside a pl model implementation help	18	3971	October 22, 2020
Is it possible to return gradient instead of loss in train_step LightningModule	0	432	July 18, 2022
How to step the optimizer twice inside one training loop? implementation help	1	905	January 11, 2023

Implementing Gradient Skipping

Related topics