Strategy¶
- class lightning_fabric.strategies.Strategy(accelerator=None, checkpoint_io=None, precision=None)[source]¶
Bases:
abc.ABC
Base class for all strategies that change the behaviour of the training, validation and test- loop.
- abstract all_gather(tensor, group=None, sync_grads=False)[source]¶
Perform an all_gather on all processes.
- abstract all_reduce(tensor, group=None, reduce_op='mean')[source]¶
Reduces the given tensor (e.g. across GPUs/processes).
- backward(tensor, module, *args, **kwargs)[source]¶
Forwards backward-calls to the precision plugin.
- Return type:
- abstract barrier(name=None)[source]¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- batch_to_device(batch, device=None)[source]¶
Moves the batch to the correct device.
The returned batch is of the same type as the input batch, just having all tensors on the correct device.
- get_optimizer_state(optimizer)[source]¶
Returns state of an optimizer.
Allows for syncing/collating optimizer state from processes in custom plugins.
- process_dataloader(dataloader)[source]¶
Wraps the dataloader if necessary.
- Parameters:
dataloader¶ (
DataLoader
) – iterable. Ideally of type:torch.utils.data.DataLoader
- Return type:
- reduce_boolean_decision(decision, all=True)[source]¶
Reduce a boolean decision across all processes.
- Return type:
- save_checkpoint(checkpoint, filepath, storage_options=None)[source]¶
Save model/training states as a checkpoint file through state-dump and file-write.
- setup_environment()[source]¶
Setup any processes or distributed connections.
This must be called by the framework at the beginning of every process, before any distributed communication takes place.
- Return type:
- setup_module(module)[source]¶
Performs setup for the model, e.g., by wrapping it by another class.
- Return type:
- setup_module_and_optimizers(module, optimizers)[source]¶
Set up a model and multiple optimizers together.
The returned objects are expected to be in the same order they were passed in. The default implementation will call
setup_module()
andsetup_optimizer()
on the inputs.
- setup_optimizer(optimizer)[source]¶
Performs setup for the optimizer, e.g., by wrapping it by another class.
- Return type:
- teardown()[source]¶
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type:
- abstract property is_global_zero: bool¶
Whether the current process is the rank zero process not only on the local node, but for all nodes.
- abstract property root_device: torch.device¶
Returns the root device.