I am training a fully convolutional siamese model, which runs a stack of 3D convolutions over some data, and converts it into an embedding using AdaptiveAvgPool3D at the end. The fully convolutional nature of it make it impossible to have batchsizes greater than 1, but it still runs pretty efficiently at 100% GPU utilisation.
However, I do have multiple GPUs. I now wondered, is it possible to split this kind of model on multiple GPUs? The straight-forward way would be to have a copy of the model on each GPU, calculate a few samples, accumulate their gradients, and then do backprop. Sadly, I don’t quite understand how I would implement this in Lightning - which parallel strategy would be the one to use for this, if any?