Multi-Gpu Inferencing

Adityam_Ghosh · September 30, 2022, 4:12am

Hi everyone,

Using multi-GPU, I am trying to infer (predict) from my Lightning Module. My Lightning Module is as follows:

class DistilBERTRegressor(pl.LightningModule):
    def __init__(self, config):
        
        super().__init__()

        self.config = config
        self.dbert = DistilBertModel.from_pretrained(config['bert']['name'], config=config['bert']['config'])

        self.drop = nn.Dropout(p=config['dropout'])
        self.linear1 = nn.Linear(self.dbert.config.hidden_size, self.config['fc']['linear1'])
        self.linear2 = nn.Linear(self.config['fc']['linear1'], self.config['fc']['linear2'])
        self.linear3 = nn.Linear(self.config['fc']['linear2'], 1)

        torch.nn.init.xavier_uniform_(self.linear1.weight)
        torch.nn.init.xavier_uniform_(self.linear2.weight)

        

    def forward(self, input_ids, attention_mask):
        dbert_out = self.dbert(
            input_ids = input_ids,
            attention_mask = attention_mask,
            return_dict=True
        )

        last_hidden_state = dbert_out.last_hidden_state
        cls_token = last_hidden_state[:, 0, :]
        yhat = self.drop(cls_token)
        yhat = self.linear1(yhat)
        yhat = self.linear2(yhat)
        yhat = self.linear3(yhat)

        return yhat

    def compute_loss(self, yhat, y):
        y = y.reshape(-1, 1)
        return torch.sqrt(F.mse_loss(yhat, y))


    def training_step(self, batch, batch_idx):

        input_ids, attention_mask, targets = batch['input_ids'], batch['attention_mask'], batch['target']
        outputs = self(input_ids, attention_mask)
        
        loss = self.compute_loss(outputs, targets.type_as(outputs)) # Calculates the loss

        self.log("train_loss", loss, prog_bar=True, logger=True, sync_dist=True)

        return {
            'loss' : loss,
        }

    def validation_step(self, batch, batch_idx):

        input_ids, attention_mask, targets = batch['input_ids'], batch['attention_mask'], batch['target']
        outputs = self(input_ids, attention_mask)

        loss = self.compute_loss(outputs, targets.type_as(outputs)) # Calculates the loss

        self.log("val_loss", loss, prog_bar=True, logger=True, sync_dist=True)

        return {
            'val_loss' : loss,
        }

    def predict_step(self, batch, batch_idx):
        input_ids, attention_mask, targets = batch['input_ids'], batch['attention_mask'], batch['target']
        return self(input_ids, attention_mask)
    
    def configure_optimizers(self):
        optimizer = optim.AdamW(self.parameters(), lr=self.config['lr'], weight_decay=self.config['weight_decay'])
        scheduler = get_scheduler(optimizer, self.config)

        return dict(
            optimizer=optimizer,
            lr_scheduler=scheduler
        )

The problem is I cannot get the full prediction results from it. So, can you please help me out?

aniketmaurya · November 3, 2022, 9:40am

Hi @Adityam_Ghosh, could you please explain more about the problem you are facing? What is the expected result and what do you get, an example?

donglihe-hub · August 17, 2023, 5:32pm

The inference results are split in different machines and do not gather automatically. I think this issue is not solved at present.

Topic		Replies	Views
Ddp2 in multi node and multi gpu failing on pytorch lightning	0	556	November 7, 2021
DDP: replacing torch dist. calls with PL directives for inter-node communication? DDP/GPU	13	1141	June 13, 2023
Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! DDP/GPU	0	779	February 6, 2024
Get batch’s datapoints across all GPUs DDP/GPU	2	1066	January 31, 2022
Lightning didn't move my model to GPU DDP/GPU	2	590	June 10, 2023

Multi-Gpu Inferencing

Related topics