Hi, I am trying to use the Tuner in order to find the optimal batch size and learning rate to finetune my model with.
I run the code a few days ago and It worked fine, then I had to make some adjustment to the model’s training_step
method. After making those adjustments, I went to run the code of the Tuner again and it’s been stuck on batch size 4 since then. Any ideas why this is happening?
The Tuner’s code is the following:
def main(args):
os.environ["TOKENIZERS_PARALLELISM"] = '1'
model = CLIPWrapper(batch_size=args.batch_size)
datamodule = CaptioningDataModule(data=args.data, num_workers=args.num_workers, batch_size=args.batch_size)
trainer = Trainer(default_root_dir=args.default_root_dir, max_epochs=5)
tuner = Tuner(trainer)
optimal_batch_size = tuner.scale_batch_size(model, datamodule=datamodule, init_val=args.batch_size,
batch_arg_name="batch_size")
model.batch_size = optimal_batch_size
datamodule.batch_size = optimal_batch_size
# finds learning rate automatically
# sets hparams.lr or hparams.learning_rate to that learning rate
# Pick point based on plot, or get suggestion
optimal_lr = tuner.lr_find(model, datamodule=datamodule).suggestion()
I do have a batch_size
property both in my CLIPWrapper
and CaptioningDataModule
classes. A little side note, is that the batch_size
property inside the CLIPWrapper
is actually the minibatch size
that I use inside the training_step
, I had to rename it to batch_size
to match the BatchSizeFinder
expectations, even though now I do not know if it was the right thing to do since I also get the following warning:
UserWarning: Field `model.batch_size` and `model.hparams.batch_size` are mutually exclusive! `model.batch_size` will be used as the initial batch size for scaling. If this is not the intended behavior, please remove either one.