Multi GPU - Autolog with multiple runs - lightning2.0

We are working on upgrading the mlflow pytorch examples to be compatible with lightning 2.0.

Previously , we had trainer.global_rank check to autolog only with rank 0 gpu (to avoid multiple runs).

To be compatible with lightning 2.0, i am upgrading the script to use LightningCLI - Script Link

When i am running the script in 4 gpus, we different mlflow runs get created. Since, the script uses LightningCLI, it no longer has the access to trainer object.

How to invoke the mlflow.pytorch.autolog with rank 0 gpu alone (to avoid duplicate mlflow runs) ?

There is a dedicated logger for MLflow in Lightning:

from lightning.pytorch.loggers import MLFlowLogger

trainer = Trainer(logger=MLFlowLogger(...))

# or in the cli
cli = LightningCLI(
    ...
    trainer_defaults={"logger": MLFlowLogger(...)},
)

Otherwise, if you really want to use the autolog, you could try this on line 240 of the script you linked:

if cli.trainer.global_rank == 0:
    mlflow.pytorch.autolog()

Thank you very much @awaelchli . At some point of time, the mlflow pytorch autolog needs to be upgraded to use MLFlowLogger.

1 Like