Shortcuts

Accelerator

class pytorch_lightning.accelerators.Accelerator(precision_plugin, training_type_plugin)[source]

Bases: object

The Accelerator Base Class. An Accelerator is meant to deal with one type of Hardware.

Currently there are accelerators for:

  • CPU

  • GPU

  • TPU

  • IPU

Each Accelerator gets two plugins upon initialization: One to handle differences from the training routine and one to handle different precisions.

Parameters
  • precision_plugin (PrecisionPlugin) – the plugin to handle precision-specific parts

  • training_type_plugin (TrainingTypePlugin) – the plugin to handle different training routines

all_gather(tensor, group=None, sync_grads=False)[source]

Function to gather a tensor from several distributed processes.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.all_gather directly.

Parameters
  • tensor (Tensor) – tensor of shape (batch, …)

  • group (Optional[Any]) – the process group to gather results from. Defaults to all processes (world)

  • sync_grads (bool) – flag that allows users to synchronize gradients for all_gather op

Return type

Tensor

Returns

A tensor of shape (world_size, batch, …)

abstract static auto_device_count()[source]

Get the devices when set to auto.

Return type

int

backward(closure_loss, *args, **kwargs)[source]

Forwards backward-calls to the precision plugin.

Parameters

closure_loss (Tensor) – a tensor holding the loss value to backpropagate

Return type

Tensor

barrier(name=None)[source]

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.barrier directly.

Return type

None

batch_to_device(batch, device=None, dataloader_idx=0)[source]

Moves the batch to the correct device. The returned batch is of the same type as the input batch, just having all tensors on the correct device.

Parameters
  • batch (Any) – The batch of samples to move to the correct device

  • device (Optional[device]) – The target device

  • dataloader_idx (int) – The index of the dataloader to which the batch belongs.

Return type

Any

broadcast(obj, src=0)[source]

Broadcasts an object to all processes, such that the src object is broadcast to all other ranks if needed.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.broadcast directly.

Parameters
  • obj (object) – Object to broadcast to all process, usually a tensor or collection of tensors.

  • src (int) – The source rank of which the object will be broadcast from

Return type

object

connect(model)[source]

Transfers ownership of the model to this plugin.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_train_batch_start directly.

Return type

None

dispatch(trainer)[source]

Hook to do something before the training/evaluation/prediction starts.

Return type

None

get_device_stats(device)[source]

Gets stats for a given device.

Parameters

device (Union[str, device]) – device for which to get stats

Return type

Dict[str, Any]

Returns

Dictionary of device stats

lightning_module_state_dict()[source]

Returns state of model.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.lightning_module_state_dict directly.

Allows for syncing/collating model state from processes in custom plugins.

Return type

Dict[str, Union[Any, Tensor]]

model_sharded_context()[source]

Provide hook to create modules in a distributed aware context. This is useful for when we’d like to.

shard the model instantly - useful for extremely large models. Can save memory and initialization time.

Return type

Generator[None, None, None]

Returns

Model parallel context.

on_predict_end()[source]

Called when predict ends.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_predict_end directly.

Return type

None

on_predict_start()[source]

Called when predict begins.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_predict_start directly.

Return type

None

on_test_end()[source]

Called when test end.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_test_end directly.

Return type

None

on_test_start()[source]

Called when test begins.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_test_start directly.

Return type

None

on_train_batch_start(batch, batch_idx, dataloader_idx=0)[source]

Called in the training loop before anything happens for that batch.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_train_batch_start directly.

Return type

None

on_train_end()[source]

Called when train ends.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_train_end directly.

Return type

None

on_train_start()[source]

Called when train begins.

Return type

None

on_validation_end()[source]

Called when validation ends.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_validation_end directly.

Return type

None

on_validation_start()[source]

Called when validation begins.

See deprecation warning below.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.on_validation_start directly.

Return type

None

optimizer_state(optimizer)[source]

Returns state of an optimizer.

Allows for syncing/collating optimizer state from processes in custom plugins.

Return type

Dict[str, Tensor]

optimizer_step(optimizer, opt_idx, closure, model=None, **kwargs)[source]

performs the actual optimizer step.

Parameters
  • optimizer (Optimizer) – the optimizer performing the step

  • opt_idx (int) – index of the current optimizer

  • closure (Callable[[], Any]) – closure calculating the loss value

  • model (Union[LightningModule, Module, None]) – reference to the model, optionally defining optimizer step related hooks

  • **kwargs – Any extra arguments to optimizer.step

Return type

None

optimizer_zero_grad(current_epoch, batch_idx, optimizer, opt_idx)[source]

Zeros all model parameter’s gradients.

Return type

None

post_dispatch(trainer)[source]

Hook to do something after the training/evaluation/prediction starts.

Return type

None

post_training_step()[source]

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.post_training_step directly.

Return type

None

pre_dispatch(trainer)[source]

Hook to do something before the training/evaluation/prediction starts.

Return type

None

predict_step(step_kwargs)[source]

The actual predict step.

See predict_step() for more details

Return type

Union[Tensor, Dict[str, Any]]

process_dataloader(dataloader)[source]

Wraps the dataloader if necessary.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.process_dataloader directly.

Parameters

dataloader (Union[Iterable, DataLoader]) – iterable. Ideally of type: torch.utils.data.DataLoader

Return type

Union[Iterable, DataLoader]

save_checkpoint(checkpoint, filepath)[source]

Save model/training states as a checkpoint file through state-dump and file-write.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.save_checkpoint directly.

Parameters
  • checkpoint (Dict[str, Any]) – dict containing model and trainer state

  • filepath (Union[str, Path]) – write-target file’s path

Return type

None

setup(trainer)[source]

Setup plugins for the trainer fit and creates optimizers.

Parameters

trainer (Trainer) – the trainer instance

Return type

None

setup_environment()[source]

Setup any processes or distributed connections.

This is called before the LightningModule/DataModule setup hook which allows the user to access the accelerator environment before setup is complete.

Return type

None

setup_optimizers(trainer)[source]

Creates optimizers and schedulers.

Parameters

trainer (Trainer) – the Trainer, these optimizers should be connected to

Return type

None

setup_precision_plugin()[source]

Attaches the precision plugin to the accelerator.

Return type

None

setup_training_type_plugin()[source]

Attaches the training type plugin to the accelerator.

Return type

None

start_evaluating(trainer)[source]

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.start_evaluating directly.

Return type

None

start_predicting(trainer)[source]

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.start_predicting directly.

Return type

None

start_training(trainer)[source]

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.start_training directly.

Return type

None

teardown()[source]

This method is called to teardown the training process.

It is the right place to release memory and free other resources.

Return type

None

test_step(step_kwargs)[source]

The actual test step.

See test_step() for more details

Return type

Union[Tensor, Dict[str, Any], None]

test_step_end(output)[source]

A hook to do something at the end of the test step.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.test_step_end directly.

Parameters

output (Union[Tensor, Dict[str, Any], None]) – the output of the test step

Return type

Union[Tensor, Dict[str, Any], None]

training_step(step_kwargs)[source]

The actual training step.

See training_step() for more details

Return type

Union[Tensor, Dict[str, Any]]

training_step_end(output)[source]

A hook to do something at the end of the training step.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.training_step_end directly.

Parameters

output (Union[Tensor, Dict[str, Any]]) – the output of the training step

Return type

Union[Tensor, Dict[str, Any]]

validation_step(step_kwargs)[source]

The actual validation step.

See validation_step() for more details

Return type

Union[Tensor, Dict[str, Any], None]

validation_step_end(output)[source]

A hook to do something at the end of the validation step.

Deprecated since version v1.5: This method is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.validation_step_end directly.

Parameters

output (Union[Tensor, Dict[str, Any], None]) – the output of the validation step

Return type

Union[Tensor, Dict[str, Any], None]

property lightning_module: pytorch_lightning.core.lightning.LightningModule

Returns the pure LightningModule.

To get the potentially wrapped model use Accelerator.model

Return type

LightningModule

property model: torch.nn.modules.module.Module

Returns the model.

This can also be a wrapped LightningModule. For retrieving the pure LightningModule use Accelerator.lightning_module

Return type

Module

property restore_checkpoint_after_pre_dispatch: bool

Override to delay restoring from checkpoint till after pre-dispatch. This is useful when the plugin requires all the setup hooks to run before loading checkpoint.

Deprecated since version v1.5: This property is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.restore_checkpoint_after_pre_dispatch directly.

Return type

bool

Returns

If true, restore checkpoint after pre_dispatch.

property results: Any

The results of the last run will be cached within the training type plugin.

Deprecated since version v1.5: This property is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.results directly.

In distributed training, we make sure to transfer the results to the appropriate master process.

Return type

Any

property root_device: torch.device

Returns the root device.

Return type

device

property setup_optimizers_in_pre_dispatch: bool

Override to delay setting optimizers and schedulers till after dispatch. This is useful when the TrainingTypePlugin requires operating on the wrapped accelerator model. However this may break certain precision plugins such as APEX which require optimizers to be set.

Deprecated since version v1.5: This property is deprecated in v1.5 and will be removed in v1.6. Please call training_type_plugin.setup_optimizers_in_pre_dispatch directly.

Return type

bool

Returns

If True, delay setup optimizers until pre_dispatch, else call within setup.