My network is composed of several modules, only some of which get called each iteration. When I run my code, I get an error, probably because there aren’t gradients flowing to each module every time:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Is there any easy way to just skip over the optimizers for modules that aren’t being used?
optimizer will not skip any parameters unless you have requires_grad set to False for some layer or if that layer isn’t used in forward then grad will of-course be zero so won’t affect those parameter weights.
There error you are getting is maybe because either you are using .detach() somewhere that breaks the backward flow or all out model parameters are requires_grad=False.