Sync output dir between DDP processes

rubencart · February 24, 2021, 10:23am

I’m looking for a good way to sync my output dir name (which contains a timestamp etc) between DDP processes. For now, I’m doing something like this:

    local_rank = os.environ.get('LOCAL_RANK', 0)

    if local_rank == 0:
        now = datetime.now(dateutil.tz.tzlocal())
        timestamp = now.strftime('%Y_%m_%d_%H_%M_%S')
        run_output_dir = os.path.join(cfg.output_dir,
                                      '%s_%s_%s_%s' % (cfg.dataset, cfg.cfg_name, timestamp, cfg.seed))
        os.environ['RUN_OUTPUT_DIR'] = run_output_dir
    else:
        run_output_dir = os.environ['RUN_OUTPUT_DIR']

Is this OK or does someone have a better solution?

I’ve tried to use torch.distributed.send and torch.distributed.recv, but these only work for tensors.

I’m also using the WandBLogger, so I have considered having all processes save output to wandb_logger.experiment.dir, but that doesn’t work because the logger returns a dummy experiment in all but the main process (link).

Topic		Replies	Views
Share state between DDP processes DDP/GPU	0	1220	June 3, 2021
Proper way to log things when using DDP	0	2202	March 12, 2021
End all distributed process after ddp DDP/GPU	4	2084	February 10, 2023
Storing test output (dict) when using DDP DDP/GPU	1	1915	January 30, 2022
How to sync rouge score between different process? DDP/GPU	1	1368	October 10, 2021

Sync output dir between DDP processes

Related topics