IPUStrategy¶
- class lightning.pytorch.strategies.IPUStrategy(accelerator=None, device_iterations=1, autoreport=False, autoreport_dir=None, parallel_devices=None, cluster_environment=None, checkpoint_io=None, precision_plugin=None, training_opts=None, inference_opts=None)[source]¶
- Bases: - lightning.pytorch.strategies.parallel.ParallelStrategy- Plugin for training on IPU devices. - Warning - This is an experimental feature. - Parameters
- device_iterations¶ – Number of iterations to run on device at once before returning to host. This can be used as an optimization to speed up training. https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/batching.html 
- autoreport¶ – Enable auto-reporting for IPUs using PopVision https://docs.graphcore.ai/projects/graphcore-popvision-user-guide/en/latest/graph/graph.html 
- autoreport_dir¶ – Optional directory to store autoReport output. 
- training_opts¶ – Optional - poptorch.Optionsto override the default created options for training.
- inference_opts¶ – Optional - poptorch.Optionsto override the default created options for validation/testing and predicting.
 
 - all_gather(tensor, group=None, sync_grads=False)[source]¶
- Perform a all_gather on all processes. - Return type
 
 - barrier(name=None)[source]¶
- Synchronizes all processes which blocks processes until the whole group enters this function. 
 - batch_to_device(batch, device=None, dataloader_idx=0)[source]¶
- Moves the batch to the correct device. - The returned batch is of the same type as the input batch, just having all tensors on the correct device. 
 - on_train_batch_start(batch, batch_idx)[source]¶
- Called in the training loop before anything happens for that batch. - Return type
 
 - teardown()[source]¶
- This method is called to teardown the training process. - It is the right place to release memory and free other resources. - Return type
 
 - training_step(*args, **kwargs)[source]¶
- The actual training step. - See - training_step()for more details
 - validation_step(*args, **kwargs)[source]¶
- The actual validation step. - See - validation_step()for more details
 - property is_global_zero: bool¶
- Whether the current process is the rank zero process not only on the local node, but for all nodes. - Return type
 
 - property root_device: torch.device¶
- Return the root device. - Return type