I would like to compare the performance of my model with different variations of EMA(exponential moving average of weights) and regular weights. With pre-2.0 lightning it was possible to extend the TrainingEpochLoop to achieve this.
Are there any options for this besides writing a custom trainer?
What I’ve come up with on my own:
- Copying the model and trainer inside a callback and running
cloned_trainer.validate(model_with_ema_weights, val_dataloader)
- Requires one to reconnect the loggers and other experiment tracking features to the cloned trainer. Also wastes memory (briefly requires 2 copies of the weights to be loaded) - Using
val_check_interval=1.0/num_validations
and usingon_validation_start
to modify the weights in-place - The validation runs are not directly comparable because the model has seen
1.0-1/num_valuations
more training data on one validation than on another.
Running multiple validations at the end of the training is an option but is it possible to have that information during the run?