Hello,
I am using a LightningModule and a Trainer and I’m using multiple Metrics from torchmetrics, some are native metrics to the library and some are customized Metrics objects.
I’m only interested in epoch-level values and as per the documentation recommendation, I only call metric.update()
followed by self.log("metric name", metric object, on_epoch= True)
during the training|validation_step()
hooks
I use CometML as a Logger but I am interested in saving the metrics history plots locally as well, so that I don’t depend on CometML only for visualization. I want these plots to be saved regularly, say every epoch.
Please note that I am using DDP to distribute on multiple GPUs
Can someone clarify how to properly design the code in my LightningModule and how to make sure I log the metrics properly -and- save their historical plots regularly ?
Thanks a lot !