Hello,
I’m training a bert model for sequence classification (from HF). I am using 16bit precision and I have run into the following error:
AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.
However, 32bit runs without any problems. I am using Pytorch Lightning version 1.1.2. I should note that, I ran a similar code in an earlier version (don’t remember which one but it was > 1.0.0) and I didn’t run into any problems with 16bit training. Here are the training arguments:
early_stop_callback = EarlyStopping(
monitor='val_loss',
min_delta=0.0,
patience=5,
verbose=False,
mode='min'
)
logger = CSVLogger(
save_dir=f'{model_dir}',
name=None,
)
checkpoint_callback = ModelCheckpoint(
filepath=Path(f'{logger.log_dir}/checkpoints')/'{epoch}-{val_loss:0.3f}-{val_accuracy:0.3f}',
save_top_k=3,
monitor='val_loss',
verbose=True,
mode='min',
prefix=''
)
callbacks = [
PrintTableMetricsCallback(),
]
trainer_args = Namespace(
progress_bar_refresh_rate=1,
max_epochs=2,
gpus=1,
accumulate_grad_batches=1,
precision=16,
overfit_batches=0.1,
checkpoint_callback=checkpoint_callback,
logger=logger,
callbacks=callbacks,
fast_dev_run=True,
reload_dataloaders_every_epoch=True,
)
I’ll put the code for the model if required. I’d like to train 16bit models instead of 32bit models to increase my batch size.
Thanks.