Deepspeed stage 3 partition_activations brings no benefit

Hi, I tried out Sean Naren’s repo - deepspeed branch, upgraded to PL 194 an deespeed 093, but partitioned activations flag within activation checkpointing produces no memory reduction.
I see some fishy behavior in deepspeed/runtime/activation_checkpointing/checkpointing.py in get_partition_size() method, the partition size n rank 0 is same as the size of the whole layer, mp_size is 1 despite running on 2 gpus.
I think it is because of this line: /home/local/CORP/aiani/anaconda3/envs/mingpt_sean_naren/lib/python3.7/site-packages/pytorch_lightning/strategies/deepspeed.py
deepspeed.checkpointing.configure(
mpu_=None,
Why None?
mpu: “An object that implements the following methods get_model_parallel_rank/group/world_size, and get_data_parallel_rank/group/world_size”

See Megatron implementation, it passes an mpu object:

I’d really appreciate some help here, guys! Thanks!