Strategy¶
- class lightning.fabric.strategies.Strategy(accelerator=None, checkpoint_io=None, precision=None)[source]¶
Bases:
ABC
Base class for all strategies that change the behaviour of the training, validation and test- loop.
- abstract all_gather(tensor, group=None, sync_grads=False)[source]¶
Perform an all_gather on all processes.
- abstract all_reduce(tensor, group=None, reduce_op='mean')[source]¶
Reduces the given tensor (e.g. across GPUs/processes).
- backward(tensor, module, *args, **kwargs)[source]¶
Forwards backward-calls to the precision plugin.
- Return type:
- abstract barrier(name=None)[source]¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- batch_to_device(batch, device=None)[source]¶
Moves the batch to the correct device.
The returned batch is of the same type as the input batch, just having all tensors on the correct device.
- clip_gradients_norm(module, optimizer, max_norm, norm_type=2.0, error_if_nonfinite=True)[source]¶
Clip gradients by norm.
- Return type:
- get_optimizer_state(optimizer)[source]¶
Returns state of an optimizer.
Allows for syncing/collating optimizer state from processes in custom plugins.
- load_checkpoint(path, state=None, strict=True)[source]¶
Load the contents from a checkpoint and restore the state of the given objects.
- Parameters:
path¶ (
Union
[str
,Path
]) – A path to where the file is locatedstate¶ (
Union
[Module
,Optimizer
,Dict
[str
,Union
[Module
,Optimizer
,Any
]],None
]) –Can be one of:
A dictionary of objects whose state will be restored in-place from the checkpoint path.
None
or the empty dict: The loaded checkpoint will be returned in full.A
Module
instance, if the checkpoint file contains a raw module state dict.A
Optimizer
instance, if the checkpoint file contains a raw optimizer state.
strict¶ (
bool
) – Whether to enforce that the keys in state match the keys in the checkpoint.
- Return type:
- Returns:
The remaining items that were not restored into the given state dictionary. If no state dictionary is given, the full checkpoint will be returned.
- load_module_state_dict(module, state_dict, strict=True)[source]¶
Loads the given state into the model.
- Return type:
- module_init_context(empty_init=None)[source]¶
A context manager wrapping the model instantiation.
Here, the strategy can control how the parameters of the model get created (device, dtype) and or apply other patches to the model.
- process_dataloader(dataloader)[source]¶
Wraps the dataloader if necessary.
- Parameters:
dataloader¶ (
DataLoader
) – iterable. Ideally of type:torch.utils.data.DataLoader
- Return type:
- reduce_boolean_decision(decision, all=True)[source]¶
Reduce a boolean decision across all processes.
- Return type:
- save_checkpoint(path, state, storage_options=None, filter=None)[source]¶
Save model, optimizer, and other state as a checkpoint file.
- Parameters:
path¶ (
Union
[str
,Path
]) – A path to where the file(s) should be savedstate¶ (
Dict
[str
,Union
[Module
,Optimizer
,Any
]]) – A dictionary with contents to be saved. If the dict contains modules or optimizers, their state-dict will be retrieved and converted automatically.storage_options¶ (
Optional
[Any
]) – Additional options for theCheckpointIO
pluginfilter¶ (
Optional
[Dict
[str
,Callable
[[str
,Any
],bool
]]]) – An optional dictionary containing filter callables that return a boolean indicating whether the given item should be saved (True
) or filtered out (False
). Each filter key should match a state key, where its filter will be applied to thestate_dict
generated.
- Return type:
- setup_environment()[source]¶
Setup any processes or distributed connections.
This must be called by the framework at the beginning of every process, before any distributed communication takes place.
- Return type:
- setup_module(module)[source]¶
Performs setup for the model, e.g., by wrapping it by another class.
- Return type:
- setup_module_and_optimizers(module, optimizers)[source]¶
Set up a model and multiple optimizers together.
The returned objects are expected to be in the same order they were passed in. The default implementation will call
setup_module()
andsetup_optimizer()
on the inputs.
- setup_optimizer(optimizer)[source]¶
Performs setup for the optimizer, e.g., by wrapping it by another class.
- Return type:
- teardown()[source]¶
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type: