RuntimeError: but found at least two devices, cpu and cuda:0!

When using Pytorch-ligahtning 1.5.8, I got this error:
“RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)”
Traceback is as follows:


Here my is code.

def training_step(self, batch):
        x = batch['feature'].float()
        y_app = batch['app_label'].long()
        x_tra_all, y_tra_all = drop_na(batch)

        app_out = self(x)
        tra_all_out = self(x_tra_all)
        out_app = nn.Linear(in_features = 50, out_features = 17)
        out_tra = nn.Linear(in_features = 50, out_features = 12)
        out_all = nn.Linear(in_features = 50, out_features = 6 )

        y_hat_app = out_app(app_out)
        y_hat_tra = out_tra(tra_all_out)   
        y_hat_all = out_all(tra_all_out)

        entropy_app = F.cross_entropy(y_hat_app, y_app)
        entropy_tra = F.cross_entropy(y_hat_tra, y_tra_all)
        entropy_all = F.cross_entropy(y_hat_all, y_tra_all)
        entropy = (entropy_app + entropy_tra + entropy_all) / 3.0
        self.log('training_loss', entropy, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        loss = {'loss': entropy}

        return loss

Then I tryed the code another, and got the same error.

 def training_step(self, batch):
        x = batch['feature'].float()
        y_app = batch['app_label'].long()
        x_tra_all, y_tra_all = drop_na(batch)

        app_out = self(x)
        
        app_out = app_out.type_as(x)
        
        tra_all_out = self(x_tra_all)
        out_app = nn.Linear(in_features = 50, out_features = 17)
        out_tra = nn.Linear(in_features = 50, out_features = 12)
        out_all = nn.Linear(in_features = 50, out_features = 6 )

        #.to(device=torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
        y_hat_app = torch.zeros(batch['app_label'].shape).long()
        y_hat_app = y_hat_app.type_as(y_app)
        y_hat_app = out_app(app_out)
        
        
        
        y_hat_tra = out_tra(tra_all_out)   
        y_hat_all = out_all(tra_all_out)

        entropy_app = F.cross_entropy(y_hat_app, y_app)
        entropy_tra = F.cross_entropy(y_hat_tra, y_tra_all)
        entropy_all = F.cross_entropy(y_hat_all, y_tra_all)
        entropy = (entropy_app + entropy_tra + entropy_all) / 3.0
        self.log('training_loss', entropy, prog_bar=True, logger=True, on_step=True, on_epoch=True)
        loss = {'loss': entropy}

        return loss

Any advice on what happened?
Thank you very much!

Hi @Wen_C, when you initialize nn.Linear inside training_step it will not be move to correct device by PL. Also, is there any specific reason to create it in training_step? I would recommend you to create any model inside __init__ method so that you can take advantage of all the features of PL.

Also, we have migrated from this forum to Github Discussions so I would recommend you to ask your questions there for quicker response.

Thanks :slight_smile: