`.detach()` cannot stop backprop in `training_step`

wusuowei60 · January 21, 2021, 11:07am

When I was implementing GAN with lightning, I first train the discriminator and thought to reuse the generated fake image for training generator. So I used .detach() in the first training step like this:

class DCGAN(pl.LightningModule):
    ...
    def training_step(self, batch, batch_idx, optimizer_idx):
        imgs, _ = batch
        if optimizer_idx == 0:
            # Train discriminator
            z = torch.randn(imgs.shape[0], self.hparams.latent_dim, device=self.device)
            self.generated_imgs = self(z)  # generate fake images
            disc_real = self.disc(imgs)
            disc_fake = self.disc(self.generated_imgs.detach())  # this should have the effect of retaining computational graph
            real_loss = self.criterion(disc_real, torch.ones_like(disc_real))
            fake_loss = self.criterion(disc_fake, torch.zeros_like(disc_fake))
            d_loss = (real_loss + fake_loss) / 2
            self.log('loss/disc', d_loss, on_epoch=True, prog_bar=True)
            return d_loss
       
        if optimizer_idx >= 1:
            # Train generator
            disc_fake = self.disc(self.generated_imgs)
            g_loss = self.criterion(disc_fake, torch.ones_like(disc_fake))
            self.log('loss/gen', g_loss, on_epoch=True, prog_bar=True)
            return g_loss

But I get

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

~~I have found the problem, which is that the graph built by generated_imgs = self(z) was also released in the first training step even with .detach().~~ (According to @goku, .detach() is not the point here, but the gradient of generator will not be calculated when optimizer_idx == 0.)

However if I instead use plain pytorch or lightning manual backprop (https://pytorch-lightning.readthedocs.io/en/latest/optimizers.html#manual-optimization), everything works out fine.

So what happened in the first case?

goku · January 21, 2021, 5:24pm

the reason is when training_step is called with optimizer_idx=0, the parameters passed in other optimizers (in this case it optimizer_idx=1) are turned off in lightning by default.

check this: https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.core.lightning.html#pytorch_lightning.core.lightning.LightningModule.toggle_optimizer

so when you do self.generated_imgs = self(z), then generator param gradients are turned off.

either you can do self.disc(self(z)) by storing z with optimizer_idx=0 or change the default behaviour by overriding toggle_optimizer.

wusuowei60 · January 21, 2021, 6:29pm

Thanks a lot. the message is really helpful.

I have also tried to update discriminator multiple times and update generator one times in one training step, by configuring the same optimizer multiple times, like this

def configure_optimizers(self):
    opt_d = ...
    opt_g = ...
    return [opt_d] * self.hparams.k + [opt_g], []

This trick just stops training, possibly for the same reason.

But such situation could be common in GAN or other type of networks, while any walking around makes code (just the training part ) harder to write and read than plain pytorch.

So is there a flag or something to turn off this behavior of lightning, so that my code can look more ‘lightning’?

goku · January 21, 2021, 6:52pm

not a flag but you can just do

class LITModule(LightningModule):
    def toggle_optimizer(self, *args, **kwargs):
        pass

to disable this behavior.

Or maybe try manual_optimization: GitHub - Lightning-AI/lightning: Build and train PyTorch models and connect them to the ML lifecycle using Lightning App templates, without handling DIY infrastructure, cost management, scaling, and other headaches.

you can read more about it in the docs. ^^

wusuowei60 · January 21, 2021, 7:11pm

Oh that’s also pretty neat. I had thought the subclass’s toggle_optimizer would be more complex than just pass.

and yeah, I’m using manual optimization and it works well.

Thank you again for timely and helpful reply.

Topic		Replies	Views
What's the best way to do two generator steps? Trainer	5	2508	October 17, 2020
Do I need to detach when using self.logger.experiment.add_scalars? LightningModule	1	381	March 12, 2024
RuntimeError is thrown when calling `manual_backward` from other than `training_step` LightningModule	1	800	April 11, 2023
Multiple Disccriminator network updates during GAN training implementation help	0	180	March 12, 2024
Computing gradients wrt inputs within training_step implementation help	1	988	November 27, 2022

`.detach()` cannot stop backprop in `training_step`

Related topics