In a transfer learning setting, I want to freeze the body and only train the head for 2 epochs. Then I want to unfreeze the whole network and use the Learning Rate finder, before continue training again.
What I want to do is similar to FastAI’s fit_one_cycle.
To do the same with PyTorch Lightning, I tried the following:
Trainer(max_epochs=2, min_epochs=0, auto_lr_find=True)
trainer.fit(model, data_module) # FastAI: learn.fit_one_cycle(2)
trainer.max_epochs = 5
# model.unfreeze() # allow the whole body to be trained
# trainer.tune(model) # LR finder
trainer.fit(model, data_module) # FastAI: learn.fit_one_cycle(3)
Unfortunately, this would invoke the training op epoch 1 twice.
Any ideas of how to approach what I want to do?
Edit 1: Link to Colab demonstrating 2x epoch 1: Google Colab
@s-rog Thank you! This seems to indeed solve the epoch logging issue.
Now I’m trying to invoke the trainer.tune(model) after the training for the first 2 epochs.
However, this fails:
LR finder stopped early due to diverging loss.
Failed to compute suggesting for `lr`. There might not be enough points.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/tuner/lr_finder.py", line 340, in suggestion
min_grad = np.gradient(loss).argmin()
File "<__array_function__ internals>", line 6, in gradient
File "/usr/local/lib/python3.6/dist-packages/numpy/lib/function_base.py", line 1042, in gradient
"Shape of array too small to calculate a numerical gradient, "
ValueError: Shape of array too small to calculate a numerical gradient, at least (edge_order + 1) elements are required.
I can confirm that the learning rate finder have never been tested in such a way, but I completely agree that we should probably support it.
I was planning on doing a bit of refactors of the tuning interface within the next month, that will hopefully solve this problem.
I don’t think assigning max_epoch = 5 like this will work because Trainer state is not reset here for eg global_step, loggers, etc. Try creating the trainer instance again with max_epochs=5, reload the model weights if required after the first fit cycle and then continue with .tune
@williamfalcon Thank you for the video and Colab. I will take a look at it. That freezing for an epoch or 2 helps for Transfer Learning came from the FastAI 2 cours . I’ll check which approach works better in combination with lr-finder once I get it to work.
@goku Thank you for the suggestion. I haven’t tried it yet, but it sounds like a workaround that would work.