I’m writing a project about finetuning a sequence generation model. I’m looking for an example about how to gather different generative results on different GPU( one machine multiple GPU ), to calculate a correct rouge score for the whole validation dataset? I know DDP could help me to sync tensor on different devices, but I have no idea how to gather rouge scores on different devices.
you may find useful to take a look at def validation_epoch_end(self, outputs)
outputs
will contain all outputs from validation_step
(list of returned things)
also I guess DP has something like validation_step_end
which also gathers all outputs from validation_step
across all gpus in your node (for dp or ddp2). Note: validation_epoch_end
aggregates outputs from all batches and validation_step_end
aggregates outputs from single stem (so if you had batch of 16 validation_step_end
will have tuple of 2 elements with 8 sequences in each and if you had 3 batches validation_step_end
will have 3 element tuple with 16 sequences in each, if I am not mistaken )
also I believe that self.all_gather can help you synchronise all tensors across all devices
Maybe you need to combine some of them
If it wasn’t helpful can you please provide more details on what you are doing, or show some examples?