FSDP for both pretrained teacher and trainable student

Is it ok to manually wrap both of teacher and student models?
If so, will each model be split into equal weight chunks and partitioned accross the 6 gpus.
The problem is that I am not seeing any memory reduction with automatic wrapping.
Really appreciate tour guidance, thanks in advance.
I also created this issue where I explain my modelsin more detail: FSDP not reducing memory for non-trainable submoduleb