Weird number of steps per epoch

YNWA · February 23, 2021, 4:36pm

Hello, I’m facing an issue of a weird number of steps per epochs being displayed and processed while training.
the number of steps per epoch should be as defined by my code len(train_dataloader) // BATCH_SIZE however I’m getting another number that corresponds to neither my train train_dataloser nor the len(train_dataloader) // BATCH_SIZE

Below is a colab link to my code : Google Colab

Any thoughts why I’m getting this ?

goku · February 23, 2021, 5:09pm

if you are referring to the steps displayed in the progress bar, then the total steps in the progress bar is actually total_train_steps + total_val_steps. In your case, the displayed value is 1076, and train_batches = 112, val_batches = 964 so total is 112+964 = 1076.

YNWA · February 23, 2021, 6:16pm

Indeed I’m referring to the steps displayed in the progress bar, so it’s normal to have the 1076 value right?
What If I only want to display train_batches ?

goku · February 23, 2021, 9:35pm

yes.

and that is not possible I guess since it doesn’t work that way. Although if you disable validation then it will just display just the training_batches.

YNWA · February 24, 2021, 4:48pm

Thank you for your reply, I couldn’t find this explanation anywhere in the docs.

Always dealing with the same code that I provided, I’m having this issue of decreasing loss but not increasing of F1 metric any idea from where this behaviour is originating?

goku · February 24, 2021, 5:45pm

the notebook is big, not possible for me to look at the complete code for now. But I’d suggest checking the metrics package thoroughly and whether you are using it correctly in your code. You might find some error there if it’s not increasing.

Topic		Replies	Views
Steps vs Iteration in Training	1	1263	January 27, 2023
Training for a set number of iterations without setting epochs?	4	7726	September 16, 2020
Run_training_epoch duration increases with more epochs LightningModule	0	501	May 25, 2023
Why isn't my model training with full training data? DataModule	2	417	September 2, 2022
Multiple data loader get stuck at epoch 1	0	1093	July 14, 2021

Weird number of steps per epoch

Related topics