Invalid Datatype for loaders - Pytorch Lightning DataModule

Digital_Moniker · December 18, 2021, 5:58pm

Hi, I’m trying a text summarization exercise and I have train and test datasets with two columns text and summary (labels). I’m using T5, Pytorch, and Lightning wrapper and I have a Pytorch Dataset class that returns the dictionary key/values listed below for text and the ids, labels, and masks as tensors.

Link top the colab notebook
Link to the NEWS dataset

return dict(
    text=text,
    summary = data_row['summary'],
    text_input_ids = text_encoding['input_ids'].flatten(),
    text_attention_mask = text_encoding['attention_mask'].flatten(),
    labels = labels.flatten(),
    labels_attention_mask = summary_encoding['attention_mask'].flatten()
)

Then I have a Lightning Data Module class that converts the dataframes into PyTorch datasets and fits them to data loaders, returning train, val, and test data loaders

class TextSummaryDataModule(pl.LightningModule):
  def __init__(
      self, 
      train_df: pd.DataFrame, 
      test_df: pd.DataFrame, 
      tokenizer: T5Tokenizer, 
      batch_size: int=8, 
      text_max_token_len: int=512, 
      summary_max_token_len: int=128
    ):
    
      super().__init__()
      
      self.train_df = train_df
      self.test_df = test_df

      self.tokenizer = tokenizer
      self.batch_size = batch_size
      self.text_max_token_len = text_max_token_len
      self.summary_max_token_len = summary_max_token_len

  def setup(self):
    self.train_dataset = TextSummaryDataset(
        self.train_df,
        self.tokenizer,
        self.text_max_token_len,
        self.summary_max_token_len
    )

    self.test_dataset = TextSummaryDataset(
        self.test_df,
        self.tokenizer,
        self.text_max_token_len,
        self.summary_max_token_len
    )

  def train_dataloader(self):
    return DataLoader(
        self.train_dataset,
        batch_size = self.batch_size,
        shuffle=True,
        num_workers=2
    )

  def val_dataloader(self):
    return DataLoader(
        self.test_dataset,
        batch_size = self.batch_size,
        shuffle=False,
        num_workers=2
    )

  def test_dataloader(self):
    return DataLoader(
        self.test_dataset,
        batch_size = self.batch_size,
        shuffle=False,
        num_workers=2
    )

Everything is working until I try to execute the model and I get the following warning and error

UserWarning: you defined a validation_step but have no val_dataloader. Skipping validation loop - but I have defined and returned this in the data module?
Invalid Datatype for loaders: TextSummaryDataModule - I am returning a dictionary of the tokens, attention_mask, and labels for both text and summary?

Topic		Replies	Views
Custom Image Lightning Dataloader DataModule	0	578	April 29, 2023
Loading PyTorch Lightning Trained checkpoint	0	2046	July 29, 2021
How to change the way dataloader handles data? DataModule	1	524	July 30, 2023
TypeError: cannot unpack non-iterable NoneType object implementation help	2	2830	November 2, 2020
Temp file error trying to run PyTorch Lightning implementation help	0	416	April 2, 2024

Invalid Datatype for loaders - Pytorch Lightning DataModule

Related topics