After fine-tuning on multi-GPU my model is moved to CPU for testing

I’m using

trainer = Trainer(
        accelerator="gpu",
        precision=16,
        max_epochs=1,
        strategy="deepspeed_stage_3",
        num_sanity_val_steps=-1,
        check_val_every_n_epoch=1,
        log_every_n_steps=10,
        logger=logger,
        limit_train_batches=10,  # uncomment for debugging
        limit_test_batches=5,  # uncomment for debugging
        limit_val_batches=5,  # uncomment for debugging
        accumulate_grad_batches=4,
        gradient_clip_val=1.0,
        callbacks=[checkpoint_callback, early_stopping_callback],
    )

trainer.fit(model, datamodule=dm)

results = trainer.test(model, datamodule=dm.test_dataloader())

My model gets distributed across multiple GPUs throughout training. Right after, when reloaded for testing, the model is moved to CPU (apparently by the backend, I don’t do anything related in my script). I want to keep the model on the GPUs so that I can run batch inference there. How should I modify my code?