Thanks all for your help! Can these discussions be summed up as follows?
- Just return light-weighted objects, and not return heavy objects or your memory will use up quickly.
- Use TorchMetrics to compute metrics because it can sync data automatically.
- If metrics are self-implemented and computed by callbacks, it can be called by different process in distributed training which may lead to diffusion.