Hello there.
While debugging the model, I noticed optimizer_step takes more time than the inference and backward step using profiler=“simple”. Then, using PytorchProfiler and tensor board trace, I found out that this is because the beginning of optimizer_step coincides with the beginning of run_training_batch and ends almost immediately after the end of the backward step. Please tell me if this is expected behavior, or how I can fix it?