Gradient checkpointing + ddp = NaN