I want to know the time proportion of each module in the language model pre-training process, such as the proportion of Attention. How much impact does it have on total training time? GPT-4 provided the answer for me (shown in the picture). But I’m worried about it hallucinating, is there any documentation and links that can help me?