Difference between BertForSequenceClassification and Bert + nn.Linear

The first thing I’d suggest is to check whether the backbone has the same weights in both cases. Although it seems correct to me, just verify it once though. Second check the dropout rate whether it’s same or not. Also maybe you need to check this too.