The closure passed to the optimizer is None when using fp16

looks like closures aren’t supported with 16bit precision training.
https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.step