IPUStrategy
- class pytorch_lightning.strategies.IPUStrategy(accelerator=None, device_iterations=1, autoreport=False, autoreport_dir=None, parallel_devices=None, cluster_environment=None, checkpoint_io=None, precision_plugin=None, training_opts=None, inference_opts=None)[source]
Bases:
pytorch_lightning.strategies.parallel.ParallelStrategy
Plugin for training on IPU devices.
- Parameters
device_iterations – Number of iterations to run on device at once before returning to host. This can be used as an optimization to speed up training. https://docs.graphcore.ai/projects/poptorch-user-guide/en/0.1.67/batching.html
autoreport – Enable auto-reporting for IPUs using PopVision https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html
autoreport_dir – Optional directory to store autoReport output.
training_opts – Optional
poptorch.Options
to override the default created options for training.inference_opts – Optional
poptorch.Options
to override the default created options for validation/testing and predicting.
- all_gather(tensor, group=None, sync_grads=False)[source]
Perform a all_gather on all processes.
- Return type
- barrier(name=None)[source]
Synchronizes all processes which blocks processes until the whole group enters this function.
- broadcast(obj, src=0)[source]
Broadcasts an object to all processes.
- on_predict_end()[source]
Called when predict ends.
- on_predict_start()[source]
Called when predict begins.
- on_test_end()[source]
Called when test end.
- on_test_start()[source]
Called when test begins.
- on_train_batch_start(batch, batch_idx, dataloader_idx=0)[source]
Called in the training loop before anything happens for that batch.
- Return type
- on_train_end()[source]
Called when train ends.
- on_train_start()[source]
Called when train begins.
- on_validation_end()[source]
Called when validation ends.
- on_validation_start()[source]
Called when validation begins.
- predict_step(*args, **kwargs)[source]
The actual predict step.
See
predict_step()
for more details
- reduce(tensor, *args, **kwargs)[source]
Reduces the given tensor (e.g. across GPUs/processes).
- setup(trainer)[source]
Setup plugins for the trainer fit and creates optimizers.
- setup_optimizers(trainer)[source]
Creates optimizers and schedulers.
- teardown()[source]
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type
- test_step(*args, **kwargs)[source]
The actual test step.
See
test_step()
for more details
- training_step(*args, **kwargs)[source]
The actual training step.
See
training_step()
for more details
- validation_step(*args, **kwargs)[source]
The actual validation step.
See
validation_step()
for more details
- property is_global_zero: bool
Whether the current process is the rank zero process not only on the local node, but for all nodes.
- Return type
- property lightning_module: Optional[pytorch_lightning.core.lightning.LightningModule]
Returns the pure LightningModule without potential wrappers.
- Return type
- property root_device: torch.device
Return the root device.
- Return type