Module is not converging

Tamir · December 20, 2021, 3:48pm

Hey everyone!
I am quite new to deep learning and PyTorch lightning, and I have some issues with my loss of values while trying to pre-train BERT for a recommendation from scratch.

I followed this tutorial https://towardsdatascience.com/build-your-own-movie-recommender-system-using-bert4rec-92e4e34938c5,and use the GitHub code as my starting point, for Bert4Rec implementation.

Here is a snippet with the relevant information from my module implementation

class Recommender(pl.LightningModule):
    def __init__(self, vocabulary_size, features=128, mask=1, dropout=0.4, lr=5e-5, iterations=[]):
            super().__init__()
    	...
            self.item_embeddings = torch.nn.Embedding(self.vocabulary_size, embedding_dim=features)
    
            self.input_pos_embedding = torch.nn.Embedding(512, embedding_dim=features)
    
            encoder_layer = nn.TransformerEncoderLayer(d_model=features, nhead=4, dropout=self.dropout)
    
            self.encoder = torch.nn.TransformerEncoder(encoder_layer, num_layers=6)
            self.linear_out = Linear(features, self.vocabulary_size)

    def encode_src(self, src_items):
            src_items = self.item_embeddings(src_items)
            batch_size, in_sequence_len = src_items.size(0), src_items.size(1)
            pos_encoder = (
              torch.arange(0, in_sequence_len, device=src_items.device)
                  .unsqueeze(0)
                  .repeat(batch_size, 1)
            )
            pos_encoder = self.input_pos_embedding(pos_encoder) 
            src_items += pos_encoder 
            src = src_items.permute(1, 0, 2)
            src = self.encoder(src) 
            return src.permute(1, 0, 2)
  
  
      def forward(self, src_items):
          src = self.encode_src(src_items)
          out = self.linear_out(src)
          return out
  
      def training_step(self, batch, batch_idx):
          src_items, y_true = batch
  
          y_pred = self(src_items)
  
          y_pred = y_pred.view(-1, y_pred.size(2))
          y_true = y_true.view(-1)
  
          src_items = src_items.view(-1)
          mask = src_items == self.mask
  
          loss = masked_ce(y_pred=y_pred, y_true=y_true, mask=mask)
          accuracy = masked_accuracy(y_pred=y_pred, y_true=y_true, mask=mask)
          
          self.log("train_loss", loss)
          self.log("train_accuracy", accuracy)
          return loss
  
  	….
      def configure_optimizers(self):
          optimizer = torch.optim.Adam(self.parameters(), lr=self.lr)
          scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
              optimizer, patience=10, factor=0.1
          )
          return {
              "optimizer": optimizer,
              "lr_scheduler": scheduler,
              "monitor": "valid_loss",
          }

For some reason, my learning rate stay pretty much the same throughout the training, here are the avarage loss of each epoch for the first 32 epochs:
[7.691485668923165, 7.690969317763656, 7.6902515966971, 7.689588720018083, 7.686376930595757, 7.685173241345136, 7.688746468560235, 7.683287980439546, 7.685947586227585, 7.683389254160471, 7.674922955048096, 7.678214648345092, 7.6736966854817155, 7.679115080618644, 7.678637226780614, 7.677104617740299, 7.6784126775281445, 7.674682577570398, 7.672071377674977, 7.668677749099197, 7.674774644849776, 7.668729655138843, 7.676391048832341, 7.660469470439373, 7.667116234371731, 7.662718962382029, 7.663188390664987, 7.663334126229043, 7.667270759681801, 7.665728591941856, 7.665296751696307, 7.662635789857851, 7.659676546091074]
As you can see, the numbers stay around 7.7 and not really going anywhere, what could be the reason for it?

Some of the responses for similar issues suggest playing around with the hyperparameters, so I did try a few things:

Since it could be stuck in local min, it could be that the learning rate needs to be changed in order to be able to go out of it (is that right?) - I tried to change my learning rate from 1-e4 to 5-e5, it didn’t help much
In order to check if the training work as expected, I tried to overfit my model on small number of datasets (10) and this the avg of the loss for the first 20 epochs looks as follow:
[0.0, 0.0, 0.0, 1.2672607898712158, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.9458177089691162, 0.0, 0.0, 0.0, 1.6752853393554688, 0.0, 0.0]

Any suggestion would be much appreciated!

Topic		Replies	Views
Validation loss does not decrease implementation help	0	285	December 24, 2021
Validation loss does not decrease while training implementation help	0	2052	December 24, 2021
How to "ligthninfy" the official PyTorch sentiment analysis tutorial?	0	462	February 22, 2021
Code structuring for text classification with hf bert-uncase implementation help	2	483	March 23, 2023
Training slowing down implementation help	1	291	October 24, 2023

Module is not converging

Related topics