How to properly use multiple trainers with ddp in one script?


For multi-gpu training with ddp, would trigger multiple processes each of which runs the script from scratch. However, this causes deadlocks if I create multiple trainers and calls their fit() sequentially. What’s the proper way to use multiple trainers with ddp in one script?

I ask this question because our MultiModalPredictor (AutoGluon Multimodal - Quick Start - AutoGluon 0.8.2 documentation) uses lightning as the backend. Behind each API call of MultiModalPredictor like fit(), predict(), and predict_proba(), we create one lightning trainer and call the trainer’s fit() or predict() API. Users generally make multiple MultiModalPredictor calls in one script, which leads to my question here. Thanks!