Yes! use self.log
This syncs the metric across devices
def training_step(...)
self.log('x', x, dist_sync=True)
Yes! use self.log
def training_step(...)
self.log('x', x, dist_sync=True)
@williamfalcon How can I use this with one of the logger classes like NeptuneLogger
? I cant find any documentation on it.
Currently each of the processes of ddp logs separately.
I am also interested in this! For per step reporting during training and at the end of each epoch.
If I call self.logger.experiment.add_scalar("some_name", some_tensor, global_step=self.global_step)
(in case of using the Tensorboard Logger) I get multiple entries for each time step. Thats hard to read and an aggregated value would be much more useful.
If I use self.logger.experiment.add_histogram("some_name", some_tensor, global_step=self.global_step)
the histograms do not even appear in tensorboard due to multiples entries written to the same time step / name tag. Ideally the tensors being reported should be merged from all processes and then jointly logged.
To be clear, I am know how to do the aggregation (e.g. mean for scalars, concat for histograms), I just would like to know how do this in Pytorch Lightning?
Currently I am thinking of a work around, for which one creates a class inheriting from EvalResult
or TrainResult
, which would have an additional function like def custom_log(self, value, fn, sync_dist=True)
which takes a lambda like fn = lambda x: self.logger.experiment.add_scalar("some_name", x, self.global_step)
.
This function would then use the standard sync from the base class, but execute the lambda on the aggregated values.
Probably not the cleanest API, but it could work.
@williamfalcon Any update on this ? I’m using MLFlow and woud too like to log the combined metrics…