Why `precision=16` for me is almost useless for speeding up?

sss · December 9, 2021, 9:23am

Part of my code is

checkpoint_callback = ModelCheckpoint(save_weights_only=False, mode="min",
        monitor="val_loss",dirpath='outputs',save_last=False,save_top_k=1)
trainer=pl.Trainer(gpus=1,strategy='dp',
        max_epochs=10,
        auto_lr_find=True,
        precision=16,
        callbacks=[
            checkpoint_callback,
            LearningRateMonitor("epoch"),
            RichProgressBar(),
        ],  
        log_every_n_steps=10,
        )
trainer.tune(model,train_loader,val_loader)
trainer.fit(model,train_loader,val_loader,ckpt_path=None)

After ten epochs, precision=32 costs 5m 33s while real time for precision=16 is 5m 55s. There are almost the same, and half precision is even a bit larger.
Used package version: pytorch-lightning 1.5.5, torch 1.10.0.
The device name is GeForce GTX 1080 Ti, cuda version is 11.1. GPU usage memory is 1167MiB, 1149MiB for precision 32 and 16 respectively. They are mostly the same.
Anybody has ever meet similary problem?

justusschock · January 16, 2023, 11:01am

Hey,

This is expected for the 1080 Ti, as it does not support half-precision operations (as you need tensor cores (1) which the 1080Ti does have, but only first gen). First gen enables FP16 support in general but it may be slower and more memory hungry on these GPUs. It really only brought a benefit with the 20XX Series.

Best,
Justus

Topic		Replies	Views
Error with mixed precision 16bit	2	4913	January 7, 2021
Lack of documentation on deepspeed / fsdp DDP/GPU	0	747	April 24, 2023
Precision doesn't work Trainer	0	723	April 14, 2022
Torch.utils.checkpoint not compatible with Mixed Precision	1	1179	February 22, 2021
Best practices for double precision training Trainer	0	86	June 8, 2024

Why `precision=16` for me is almost useless for speeding up?

Related topics