@andrasiani In FSDP, there is a possibility to manually wrap yes. Have you tried this?
https://lightning.ai/docs/pytorch/stable/advanced/model_parallel.html#manual-wrapping
Using the auto-wrap-policy, the same should be possible too. And yes, afaik it is normal that the top level module is wrapped with FSDP.
For deepspeed, I don’t think it is possible to control that, but I haven’t checked in detail.