RuntimeError: CUDA error: out of memory

I’ve tried to run very basic example from one of the tutorials on a small fraction of the MNIST dataset, with ‘ddp’, but encounter RuntimeError: CUDA error: out of memory. .
It works fine with 2 GPUs, but crashes with 4 GPUs
On the machine, I am running on, there are 8 GPUs Tesla K40 with 12Gb RAM each and CUDA Version 11.1
Here is the very minimal example.

import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl

class MNISTModel(pl.LightningModule):
def init(self):
super(MNISTModel, self).init()
# not the best model…
self.l1 = torch.nn.Linear(28 * 28, 10)

def forward(self, x):
    return torch.relu(self.l1(x.view(x.size(0), -1)))

def training_step(self, batch, batch_nb):
    x, y = batch
    y_hat = self(x)
    loss = F.cross_entropy(y_hat, y)
    tensorboard_logs = {'train_loss': loss}
    return {'loss': loss, 'log': tensorboard_logs}

def configure_optimizers(self):
    return torch.optim.Adam(self.parameters(), lr=0.02)

def train_dataloader(self):
    return DataLoader(MNIST("~/data", train=True, download=True, 
                                  transform=transforms.ToTensor()), batch_size=4)

mnist_model = MNISTModel()
trainer = pl.Trainer(max_epochs=5, limit_train_batches=0.1, gpus=4, accelerator=‘ddp’)
trainer.fit(mnist_model)

Could someone please help me to understand what I am doing wrong or what is the problem and how to fix it ?

These are my packages:
cudatoolkit 11.0.221
python 3.7.4
pytorch 1.7.1
pytorch-lightning 1.1.4
torchvision 0.8.2

Thank you in advance!

cuda_oom|689x253

this seems suspicious, mind try the very same example on a single GPU, for example, Google Colab?

Thank you, it worked fine with single GPU…Moreover it worked fine on another server that has Volta 100 GPUs, but failed with Out of Memory error, on a server with Tesla K80.