Best way to wrap a LightningModule to report generic metrics

alexanderjipa · October 18, 2023, 9:10pm

Hello, I’d like to collect and publish generic throughput metrics, e.g. collect the number of samples in training_step using extract_batch_size and duration to log() (thus I need access to the underlying LightningModule) a metric on training_step_end (current, running average, last 20 steps, etc.).
An obvious choice seem to be creating a new Callback, but I don’t want the duration metric to be skewed by the duration of other callbacks (or depend on their order). In other words I need a guarantee that the critical section only does training_step for a LightningModule.
Since this functionality is a cross-cutting concern it can’t be invasive - we don’t want to alter the original model (e.g. open-source), we want to wrap it.
I tried wrapping the model in a LightningModule but quickly got stuck with the intricacies of how PyTorch Module handles attributes as well as how PyTorch Lightning detects available LightningModule methods. Does PyTorch Lightning have a boilerplate solution for this?
I tried looking into plugins, but they seem to be tightly-coupled with the strategies.
What’s the organic PyTorch Lightning solution for a problem like this?
Thank you!

Topic		Replies	Views
Does logger account for batch length? LightningModule	0	376	July 20, 2021
Metrics or Callbacks? callbacks	5	3124	November 14, 2022
Grouping custom metrics by configuration implementation help	1	432	April 4, 2022
How to log metrics and losses correctly when model returns dictionary as output	1	597	June 19, 2023
Custom `trainer.test`	1	561	November 1, 2022

Best way to wrap a LightningModule to report generic metrics

Related topics