FullyShardedDataParallel no memory decrease

This is likely because you need to your configure optimizers from

def configure_optimizers(self):
    return YourOptimizer(self.parameters(), ...)

to

def configure_optimizers(self):
    return YourOptimizer(self.trainer.model.parameters(), ...)

This is a quirk that is currently necessary in Lightning, but as FSDP matures in the future, this won’t be necessary anymore.

If this doesn’t solve your issue, please feel free to post a GitHub bug report and we can take a closer look.