Trained weights are on CPU despite the model being trained on GPU

I’ve been tackling this problem for several hours but without luck.

I’m training a CNN on GPU and everything works great. However, once the model finishes training and I print the weights using model.state_dict(), I see the weights residing on the CPU. More perplexing is if I save the weights and then load them back to the gpu as follows and then print the weights, they are still on CPU:, 'outputs/model.pth')
FacesModel = LitFacesModel()
FacesModel.load_state_dict(torch.load('outputs/model.pth', map_location='cuda:0'))

Here’s a reduced version of the model I’m training

class LitFacesModel(pl.LightningModule):
    def __init__(self):
        self.bn1 = nn.BatchNorm2d(32)
        self.bn2 = nn.BatchNorm2d(64)
        self.bn3 = nn.BatchNorm2d(128)
        self.bn4 = nn.BatchNorm2d(256)
        self.cnv1 = nn.Conv2d(3, 32, kernel_size = 3)
        self.cnv2 = nn.Conv2d(32, 64, kernel_size = 3)
        self.cnv3 = nn.Conv2d(64, 128, kernel_size = 3)
        self.cnv4 = nn.Conv2d(128, 256, kernel_size = 3)
        self.rel = nn.ReLU()
        self.avg = nn.AvgPool2d(2, 2)
        self.flat = nn.Flatten()
        self.fc1 = nn.Linear(25600, 132)
        self.fc2 = nn.Linear(132, CLASSES)

    def forward(self,x):
        out = self.avg(self.bn1(self.rel(self.cnv1(x))))
        out = self.avg(self.bn2(self.rel(self.cnv2(out))))
        out = self.avg(self.bn3(self.rel(self.cnv3(out))))
        out = self.avg(self.bn4(self.rel(self.cnv4(out))))
        out = self.flat(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

    def configure_optimizers(self):
        opt_func = torch.optim.Adam
        optimizer = torch.optim.Adam(self.parameters(), LR)
        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
        return [optimizer], [lr_scheduler]

    def training_step(self, batch):
        images, labels = batch 
        pred = self(images)

Any idea why this is happening?


Can you show me the Trainer settings you used?

Absolutely, here are the trainer settings:
trainer = pl.Trainer(max_epochs=EPOCHS,log_every_n_steps=10, profiler="simple", logger=logger,accelerator="gpu", devices=1)

How did you check that the weights are on CPU during training? Did you print them?
Note that after training finishes, the model gets moved back to CPU. So if you do this:
print(model.device)  # prints cpu

It will always print “cpu” for the device. Maybe this got you confused.

1 Like

Yes, I believe that got me confused. So after the model is trained, let’s say I’d like to test it on an image using pred = model(img). How can I run this prediction on GPU, given that img is on GPU as well?

It depends if you want to do it with or without Lightning.

With Lightning: You can implement the predict_step hook in the LightningModule and then call trainer.predict(model). Docs

Without Lightning: You need to do model = model.cuda() and then also move the input data to the GPU. Docs

1 Like

Works like a charm. Thanks!

1 Like

that means you have to create a dataloader in order to just simply call predict. It seems to me, unless I need to perform LARGE and FASTEST inference, otherwise I would only rely on GPU for training and I am okay with just inference on CPU.