How to split my Pytorch model to different gpu?

Dear @ Hannibal046,

For 1, Currently, model parallelism is supported in Lightning but only for Sequential Model.
You could convert your encoder / decoder into one and use our Model Parallelism beta feature: https://pytorch-lightning.readthedocs.io/en/stable/multi_gpu.html#sequential-model-parallelism-with-checkpointing.

If you are working on multiple gpus, ddp_sharded can help too as it will shard gradients across gpus and reduce memory footprint.

For 2, did you use plugings=ddp_sharded. Could you share an image with memory peak with and without for your own model ?

Best,
T.C