Logging on DDP CPU

carmocca · October 6, 2020, 4:13pm

Hi,

I am trying to log with distributed_backend=ddp_cpu, however, nothing appears.
I suspect this is due to multiprocessing shenanigans because it works correctly with ddp.

Does anybody know how to fix it? I want this so I can test that things are getting logged correctly with more than one process without a GPU.

Code

import pytorch_lightning as pl
import logging


class LoggingCallback(pl.Callback):
    def __init__(self):
        super().__init__()
        logging.basicConfig(level=logging.INFO)

    @pl.utilities.rank_zero_only
    def on_batch_start(self, trainer, pl_module):
        super().on_batch_start(trainer, pl_module)
        logging.info("Logged message!")


if __name__ == "__main__":
    trainer = pl.Trainer(
        max_epochs=2,
        limit_train_batches=1,
        limit_val_batches=1,
        progress_bar_refresh_rate=0,
        weights_summary=None,
        logger=False,
        callbacks=[LoggingCallback()],
        # Works correctly if the following 2 are not set,
        # meaning "Logged message!" is logged twice
        num_processes=2,
        distributed_backend="ddp_cpu",
    )

    trainer.fit(DummyModule(batch_size=1))  # DummyModule is a linear layer on MNIST

carmocca · October 7, 2020, 10:09pm

Answering myself, since I found the solution.

Just need to configure logging also inside of configure_ddp, for example:

class TestModule(DummyModule):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # this can be here or in the callback
        logging.basicConfig(level=logging.INFO)

    def configure_ddp(self, *args, **kwargs):
        # config logging again here otherwise processes
        # spawned by multiprocessing are not correctly configured
        logging.basicConfig(level=logging.INFO)
        return super().configure_ddp(*args, **kwargs)

Topic		Replies	Views
Proper image logging callback with DDP DDP/GPU	2	655	June 19, 2023
Multiple Loggers and DDP	2	2155	October 14, 2020
Error Logging DDP Trainer metrics in a remote function implementation help	0	218	November 15, 2023
Ddp2 in multi node and multi gpu failing on pytorch lightning	0	535	November 7, 2021
Multi-task model in version 2.0.9 with DDP error DDP/GPU	0	906	October 4, 2023

Logging on DDP CPU

Code

Related topics