trainer = Trainer( accelerator="gpu", precision=16, max_epochs=1, strategy="deepspeed_stage_3", num_sanity_val_steps=-1, check_val_every_n_epoch=1, log_every_n_steps=10, logger=logger, limit_train_batches=10, # uncomment for debugging limit_test_batches=5, # uncomment for debugging limit_val_batches=5, # uncomment for debugging accumulate_grad_batches=4, gradient_clip_val=1.0, callbacks=[checkpoint_callback, early_stopping_callback], ) trainer.fit(model, datamodule=dm) results = trainer.test(model, datamodule=dm.test_dataloader())
My model gets distributed across multiple GPUs throughout training. Right after, when reloaded for testing, the model is moved to CPU (apparently by the backend, I don’t do anything related in my script). I want to keep the model on the GPUs so that I can run batch inference there. How should I modify my code?