Best practices for double precision training

gupta839 · June 8, 2024, 1:58am

Hi, I am writing a pytorch lightning wrapper for training an ML interatomic potential, where double precision is used frequently. All my input data to the model is generally in double precision. What are the best practices to ensure that I can use both single and double precision when needed.

In my wrapper, pl.Trainer works fine with precision=32, but for precision=64 I get error:

  File "/opt/mambaforge/mambaforge/envs/colabfit/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Found dtype Float but expected Double

Which is surprising and opposite of what I expected! I thought single precision would be a problem.

Below is my trainer invoking line.

precision = 32 if self.model_manifest["precision"] == "single" else 64
        return pl.Trainer(
            logger=[self.tb_logger, self.csv_logger],
            max_epochs=self.optimizer_manifest["epochs"],
            accelerator="auto",
            strategy="ddp",
            callbacks=self.callbacks,
            num_nodes=num_nodes,
            precision=64,
        )

Topic		Replies	Views
Precision doesn't work Trainer	0	723	April 14, 2022
Evaluate model with half precision while training in mixed precision LightningModule	0	1124	September 22, 2022
Precision 16 run problem implementation help	0	67	June 4, 2024
Why `precision=16` for me is almost useless for speeding up? Trainer	1	1099	January 16, 2023
Error with mixed precision 16bit	2	4933	January 7, 2021

Best practices for double precision training

Related topics