Manual Optimization with Deepspeed

I have been using automatic optimization with deepspeed earlier and it worked very well for me. My training loop now is a bit complex (involves RL), so I cannot use automatic optimization. In my script, I am currently using deepspeed via DeepSpeedStrategy and my lightning module does manual optimization. There were some initial bugs about some parameters being on cpu instead of cuda, so I had my input put on self.device explicitly in my lightning module (I am not sure if manual optimization handles that anymore). Nevertheless, my script is running currently but is extremely slow with minimal GPU utilization. I cannot track down the exact problem since no error is being thrown. But I wanted to know if lightning supports deepspeed with manual optimization? If not, do you have any recommendations that would involve minimal code change from my current state (like native torch + native deepspeed)?
Thank you!