How to run Trainer.fit() and Trainer.test() in DDP distributed mode

You cannot run trainer.test after trainer.fit (or multiple trainer.fit/test in general) in ddp mode.
This only works with ddp_spawn . You need to either

  1. remove the trainer.test call
  2. move the trainer.test call to a new test script
  3. choose ddp_spawn (but has it’s own limitations)

This is simply a limitation of multiprocessing and a tradeoff between ddp and ddp_spawn.
More information in this section towards the bottom
https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html#distributed-data-parallel