The time proportion in the language model pre-training process

I want to know the time proportion of each module in the language model pre-training process, such as the proportion of Attention. How much impact does it have on total training time? Is there any documentation and links that can help me?

closing since it’s duplicate of The time proportion of each module in pre-training process