Context:
- I would like to train several models at the same time. These models share the same structure but may differ in initialization or some options.
- It is expensive for the data loader to prepare a batch of data. Once a batch of data is prepared, it’s desirable to reuse it to train several models in parallel.
- The model is small enough; it’s easy to fit several of these models into a single GPU.