I’m using
trainer = Trainer(
accelerator="gpu",
precision=16,
max_epochs=1,
strategy="deepspeed_stage_3",
num_sanity_val_steps=-1,
check_val_every_n_epoch=1,
log_every_n_steps=10,
logger=logger,
limit_train_batches=10, # uncomment for debugging
limit_test_batches=5, # uncomment for debugging
limit_val_batches=5, # uncomment for debugging
accumulate_grad_batches=4,
gradient_clip_val=1.0,
callbacks=[checkpoint_callback, early_stopping_callback],
)
trainer.fit(model, datamodule=dm)
results = trainer.test(model, datamodule=dm.test_dataloader())
My model gets distributed across multiple GPUs throughout training. Right after, when reloaded for testing, the model is moved to CPU (apparently by the backend, I don’t do anything related in my script). I want to keep the model on the GPUs so that I can run batch inference there. How should I modify my code?