used the torchdistx package and integration in Trainer |
materialize the model weights manually, or follow our guide for initializing large models |
PR17995 |
defined def training_step(self, dataloader_iter, batch_idx) in LightningModule |
remove batch_idx from the signature and expect dataloader_iter to return a triplet (batch, batch_idx, dataloader_idx) |
PR18390 |
defined def validation_step(self, dataloader_iter, batch_idx) in LightningModule |
remove batch_idx from the signature and expect dataloader_iter to return a triplet (batch, batch_idx, dataloader_idx) |
PR18390 |
defined def test_step(self, dataloader_iter, batch_idx) in LightningModule |
remove batch_idx from the signature and expect dataloader_iter to return a triplet (batch, batch_idx, dataloader_idx) |
PR18390 |
defined def predict_step(self, dataloader_iter, batch_idx) in LightningModule |
remove batch_idx from the signature and expect dataloader_iter to return a triplet (batch, batch_idx, dataloader_idx) |
PR18390 |
used batch = next(dataloader_iter) in LightningModule *_step hooks |
use batch, batch_idx, dataloader_idx = next(dataloader_iter) |
PR18390 |
relied on automatic detection of Kubeflow environment |
use Trainer(plugins=KubeflowEnvironment()) to explicitly set it on a Kubeflow cluster |
PR18137 |