FullyShardedDataParallel no memory decrease

Thanks but I’m already configuring the optimiser like that:

def configure_optimizers(self):
    return SGD(self.trainer.model.parameters(), lr=1e-3, momentum=0.9)

I’ll post that GitHub bug report. Thanks for your help!

Brett