Ever since my model / data got a bit bigger, it seems to take forever / hangs and never gets to the training / predict next step anymore.
In the example of live prediction. I am only interested in last 200 rows to make 1 final prediction (so should not load a lot of data).
My model has 770k params (LSTM) and I am running on MPS (M1).
In case it’s relevant, I use TimeSeriesDataset from pytorch-forecasting as dataloader:
def set_prod_data(self, prod_df):
self.prod_dataset = TimeSeriesDataSet.from_dataset(self.training_dataset, prod_df, predict_mode = True)
self.prod_dataloader = self.prod_dataset.to_dataloader(train=False, batch_size=self.p['batch_size'] * 1, num_workers=self.p['num_workers'])
return self.prod_dataloader, self.prod_dataset
and in main.py:
loaders = dataloader.Dataloaders(dataset, dataset_predict, p)
prod_dataloader, prod_dataset = loaders.set_prod_data(dataset_prod)
model = lstm.LSTMClassifier(p, loaders)
training = training.Training(p, device, reset=False)
trainer = training.get_trainer()
eval = eval.Eval(p, trainer, model, loaders)
print('Starting evaluation') # this is printed)
trainer.predict(model = model, dataloaders=prod_dataloader)
I have a debug marker set at the first line of def predict_step(self, batch, batch_idx):
but it never reaches there.
I am really torn what is happening. Is the LSTM really too big to load in memory? My batch size is only 200, and production data is tiny.
How can I figure out what where the infinite loop is? It just seems to dissapear after calling .predict() and never reaches predict_step().
I have all my loaders in the object loaders (also training data loader). This is being passed when lstm is init() because I use the loaders.decode to do some checks during training. But still, the loading finishes, so unsure what is happening.