TrainingTypePlugin¶
- class pytorch_lightning.plugins.training_type.TrainingTypePlugin(checkpoint_io=None)[source]¶
Bases:
abc.ABC
Base class for all training type plugins that change the behaviour of the training, validation and test- loop.
- abstract all_gather(tensor, group=None, sync_grads=False)[source]¶
Perform an all_gather on all processes.
- abstract barrier(name=None)[source]¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- connect(model)[source]¶
Called by the accelerator to connect the accelerator and the model with this plugin.
- Return type
- model_sharded_context()[source]¶
Provide hook to create modules in a distributed aware context. This is useful for when we’d like to shard the model instantly, which is useful for extremely large models which can save memory and initialization time.
Returns: Model parallel context.
- Return type
- on_train_batch_start(batch, batch_idx, dataloader_idx=0)[source]¶
Called in the training loop before anything happens for that batch.
- Return type
- post_dispatch(trainer)[source]¶
Hook to do something after the training/evaluation/prediction finishes.
- Return type
- pre_dispatch()[source]¶
Hook to do something before the training/evaluation/prediction starts.
- Return type
- process_dataloader(dataloader)[source]¶
Wraps the dataloader if necessary.
- Parameters
dataloader¶ (
Union
[Iterable
,DataLoader
]) – iterable. Ideally of type:torch.utils.data.DataLoader
- Return type
- abstract reduce(tensor, group=None, reduce_op='mean')[source]¶
Reduces the given tensor (e.g. across GPUs/processes).
- reduce_boolean_decision(decision)[source]¶
Reduce the early stopping decision across all processes.
- Return type
- save_checkpoint(checkpoint, filepath)[source]¶
Save model/training states as a checkpoint file through state-dump and file-write.
- setup_environment()[source]¶
Setup any processes or distributed connections.
This is called before the LightningModule/DataModule setup hook which allows the user to access the accelerator environment before setup is complete.
- Return type
- abstract teardown()[source]¶
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type
- property handles_gradient_accumulation: bool¶
Whether the plugin handles gradient accumulation internally.
- abstract property is_global_zero: bool¶
Whether the current process is the rank zero process not only on the local node, but for all nodes.
- property lightning_module: pytorch_lightning.core.lightning.LightningModule¶
Returns the pure LightningModule without potential wrappers.
- property lightning_restore_optimizer_and_schedulers: bool¶
Override to disable Lightning restoring optimizers/schedulers.
This is useful for plugins which manage restoring optimizers/schedulers.
- property model: Optional[torch.nn.modules.module.Module]¶
Returns the potentially wrapped LightningModule.
- abstract property on_gpu: bool¶
Returns whether the current process is done on GPU.
- abstract property on_tpu: bool¶
Returns whether the current process is done on TPU.
- property restore_checkpoint_after_pre_dispatch: bool¶
Override to delay restoring from checkpoint till after pre-dispatch. This is useful when the plugin requires all the setup hooks to run before loading checkpoint.
- Returns
If true, restore checkpoint after pre_dispatch.
- property results: Optional[Union[List[Dict[str, float]], List[Any], List[List[Any]]]]¶
Enables plugin-agnostic access to the result returned by the training/evaluation/prediction run.
The result is cached instead of returned directly, because some plugins require transmitting the results from one multiprocessing context to another in a separate step. For example, the plugins that use the “spawn” start-method send the result to the master process through a multiprocessing queue (shared memory).
- abstract property root_device: torch.device¶
Returns the root device.
- property setup_optimizers_in_pre_dispatch: bool¶
Override to delay setting optimizers and schedulers till after dispatch. This is useful when the TrainingTypePlugin requires operating on the wrapped accelerator model. However this may break certain precision plugins such as APEX which require optimizers to be set.
- Returns
If True, delay setup optimizers till pre_dispatch, else call within setup.
- property should_rank_save_checkpoint: bool¶
Returns whether the checkpoint should be saved (rank based)