I don’t have any experience with TorchElastic, but perhaps you could pass your own ModelCheckpoint
callback, with a defined filepath
so you can always know where the checkpoint is saved.
I don’t have any experience with TorchElastic, but perhaps you could pass your own ModelCheckpoint
callback, with a defined filepath
so you can always know where the checkpoint is saved.