The new .log functionality works similar to how it did when it was in the dictionary, however we now automatically aggregate the things you log each step and log the mean each epoch if you specify so. For example the code you wrote above can be re-written as:
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
self.log("loss", loss)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
# on_epoch=True by default in `validation_step`,
# so it is not necessary to specify
self.log("val_loss", on_epoch=True)
This eliminates the need for validation_step_end. If for some reason you still wanted to do this aggregation yourself, you could also do:
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
self.log("loss", loss)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
return loss
def validation_epoch_end(self, batch, outs):
# outs is a list of whatever you returned in `validation_step`
loss = torch.stack(outs).mean()
self.log("val_loss", loss)
Which functions equivalently. Hope this clears things up!
Hi Teddy, can I ask you how can I do that? Is it by default at the end of the epoch or have I to specify so? It is not clear to me how to organize the steps!
Hi @Ch-rode, if you log your loss in training or validation step with self.log then you donāt need to implement validation_epoch_end method (same goes for training step).
Lightning will take care of it by automatically aggregating your loss that you logged in the {training|validation}_step at the end of each epoch.
The flow would be:
Epoch start
Loss computed and logged in training step
Epoch end
Fetch the training step loss and aggregate
Continue next epoch
Hope I was able to solve your problem.
Also, we have migrated our discussion from this forum to GitHub discussion. I would request you to ask your questions there for quicker response.
Although it takes care of it automatically, I donāt think that it is entirely correct. It averages over the metrics calculated on a per-batch basis. In general, we are interested in the value of the metric/loss over the entire validation set.
I donāt know if it is specific to the TensorboardLogger but if we have something like this:
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
self.log("train_loss", loss)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.forward(x)
loss = F.cross_entropy(y_hat, y)
# on_epoch=True by default in `validation_step`,
# so it is not necessary to specify
self.log("val_loss", on_step=True, on_epoch=True)
then in tensorboard we get two plots: val_loss_step and val_loss_epoch. The step in the first plot is not theself.global_step but essentially the global_step of the validation_dataloader. Whereas in the second plot, the step is the self.global_step. Is somewhere documented this behavior? That is, what āstepā is used when logging inside {training,validation}_step, on_{train,validation}_epoch_end etc?