Adding instruction before and end of the each training loop

fferdaus · June 12, 2024, 4:23am

Hi,
I am using the following code snippet to train UNet with the Intel Gaudi accelerator. I am trying to execute some additional lines of code before and after each validation/training epoch but not sure which file I need to modify. I believe, I need to modify /usr/local/lib/python3.10/dist-packages/lightning/pytorch/loops/fit_loop.py, but not sure where I should exactly poke.

from lightning.pytorch import Trainer
trainer = Trainer(
	logger=False,
	profiler=prof,
	precision="bf16-mixed" if args.amp else "32-true",
	devices=args.hpus if args.hpus else None,
	accelerator=HPUAccelerator() if args.hpus else None,
	benchmark=True,
	deterministic=False,
	min_epochs=args.min_epochs,
	max_epochs=args.max_epochs,
	sync_batchnorm=args.sync_batchnorm,
	gradient_clip_val=args.gradient_clip_val,
	callbacks=callbacks,
	num_sanity_val_steps=0,
	default_root_dir=args.results,
	enable_checkpointing=args.save_ckpt,
	strategy=HPUParallelStrategy(parallel_devices=parallel_hpus, bucket_cap_mb=args.bucket_cap_mb,gradient_as_bucket_view=True,static_graph=True) if args.hpus > 1 else SingleHPUStrategy() if args.hpus == 1 else None,
	limit_train_batches=1.0 if args.train_batches == 0 else args.train_batches,
	limit_val_batches=1.0 if args.test_batches == 0 else args.test_batches,
	limit_test_batches=1.0 if args.test_batches == 0 else args.test_batches,
	)
trainer.fit(model, train_dataloaders=train_dl)

The main implementation can be found here.

Topic		Replies	Views
Modifications of the general order of training/inference loops in PyTorch Lightning Trainer	2	668	August 26, 2020
Running multiple validation steps after each training epoch implementation help	1	600	December 16, 2023
How to add a pretrain step in training_step	2	1083	May 25, 2022
Does not run validation step after epoch when running with all data implementation help	5	2527	May 1, 2023
Training_epoch_end is never called LightningModule	3	1566	February 22, 2021

Adding instruction before and end of the each training loop

Related topics