Strategy¶
- class lightning.pytorch.strategies.Strategy(accelerator=None, checkpoint_io=None, precision_plugin=None)[source]¶
Bases:
ABC
Base class for all strategies that change the behaviour of the training, validation and test- loop.
- abstract all_gather(tensor, group=None, sync_grads=False)[source]¶
Perform an all_gather on all processes.
- backward(closure_loss, optimizer, *args, **kwargs)[source]¶
Forwards backward-calls to the precision plugin.
- Parameters:
closure_loss¶ (
Tensor
) – a tensor holding the loss value to backpropagateoptimizer¶ (
Optional
[Optimizer
]) – An optional optimizer that gets passed down to the precision plugin’s backward*args¶ (
Any
) – Positional arguments that get passed down to the precision plugin’s backward, intended as arguments for the actual function that performs the backward, likebackward()
.**kwargs¶ (
Any
) – Keyword arguments for the same purpose as*args
.
- Return type:
- abstract barrier(name=None)[source]¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- batch_to_device(batch, device=None, dataloader_idx=0)[source]¶
Moves the batch to the correct device.
The returned batch is of the same type as the input batch, just having all tensors on the correct device.
- connect(model)[source]¶
Called by the accelerator to connect the accelerator and the model with this plugin.
- Return type:
- model_sharded_context()[source]¶
Provide hook to create modules in a distributed aware context. This is useful for when we’d like to shard the model instantly, which is useful for extremely large models which can save memory and initialization time.
Returns: Model parallel context.
- Return type:
- on_exception(exception)[source]¶
Called when the trainer execution is interrupted by an exception.
- Return type:
- on_train_batch_start(batch, batch_idx)[source]¶
Called in the training loop before anything happens for that batch.
- Return type:
- optimizer_state(optimizer)[source]¶
Returns state of an optimizer.
Allows for syncing/collating optimizer state from processes in custom plugins.
- optimizer_step(optimizer, closure, model=None, **kwargs)[source]¶
Performs the actual optimizer step.
- Parameters:
- Return type:
- process_dataloader(dataloader)[source]¶
Wraps the dataloader if necessary.
- Parameters:
dataloader¶ (
object
) – iterable. Ideally of type:torch.utils.data.DataLoader
- Return type:
- abstract reduce(tensor, group=None, reduce_op='mean')[source]¶
Reduces the given tensor (e.g. across GPUs/processes).
- reduce_boolean_decision(decision, all=True)[source]¶
Reduce a boolean decision across all processes.
- Return type:
- save_checkpoint(checkpoint, filepath, storage_options=None)[source]¶
Save model/training states as a checkpoint file through state-dump and file-write.
- setup_environment()[source]¶
Setup any processes or distributed connections.
This is called before the LightningModule/DataModule setup hook which allows the user to access the accelerator environment before setup is complete.
- Return type:
- teardown()[source]¶
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type:
- training_step(*args, **kwargs)[source]¶
The actual training step.
See
training_step()
for more details
- validation_step(*args, **kwargs)[source]¶
The actual validation step.
See
validation_step()
for more details
- property handles_gradient_accumulation: bool¶
Whether the plugin handles gradient accumulation internally.
- abstract property is_global_zero: bool¶
Whether the current process is the rank zero process not only on the local node, but for all nodes.
- property lightning_module: LightningModule | None¶
Returns the pure LightningModule without potential wrappers.
- property lightning_restore_optimizer: bool¶
Override to disable Lightning restoring optimizers/schedulers.
This is useful for plugins which manage restoring optimizers/schedulers.