Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
[2.5.0] - 2024-12-19¶
[2.5.0] - Added¶
Added
step
parameter toTensorBoardLogger.log_hyperparams
to visualize changes during training (#20176)Added timeout to DeepSpeedStrategy (#20474)
Added FP8 + FSDP2 + torch.compile examples for Fabric (#20440)
Added RTX 4080 super to chips dictionary (#20285)
Added device property to lazy load functionality (#20183)
Added
ddp_find_unused_parameters_true
alias in Fabric’s DDPStrategy (#20125)
[2.5.0] - Changed¶
[2.5.0] - Fixed¶
Fixed use of
convert_module
in FSDP to avoid using more memory than necessary during initialization (#20323)
[2.4.0] - 2024-08-06¶
[2.4.0] - Added¶
[2.4.0] - Changed¶
[2.4.0] - Removed¶
[2.4.0] - Fixed¶
[2.3.0] - 2024-06-13¶
[2.3.0] - Added¶
Added sanitization for classes before logging them as hyperparameters (#19771)
Enabled consolidating distributed checkpoints through
fabric consolidate
in the new CLI (#19560)Added the ability to explicitly mark forward methods in Fabric via
_FabricModule.mark_forward_method()
(#19690)Added support for PyTorch 2.3 (#19708)
Added
ModelParallelStrategy
to support 2D parallelism (#19846, #19852, #19870, #19872)Added a call to
torch.distributed.destroy_process_group
in atexit handler if process group needs destruction (#19931)Added support for configuring hybrid-sharding by passing a tuple for the
FSDPStrategy(device_mesh=...)
argument (#19504)
[2.3.0] - Changed¶
The
Fabric.rank_zero_first
context manager now uses a barrier without timeout to avoid long-running tasks to be interrupted (#19448)Fabric now raises an error if you forget to call
fabric.backward()
when it is needed by the strategy or precision selection (#19447, #19493)_BackwardSyncControl
can now control what to do when gradient accumulation is disabled (#19577)
[2.3.0] - Removed¶
Removed support for PyTorch 1.13 (#19706)
[2.3.0] - Fixed¶
Fixed a matrix shape mismatch issue when running a model loaded from a quantized checkpoint (bitsandbytes) (#19886)
[2.2.2] - 2024-04-11¶
[2.2.2] - Fixed¶
[2.2.1] - 2024-03-04¶
[2.2.1] - Fixed¶
Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually (#19446)
[2.2.0] - 2024-02-08¶
[2.2.0] - Added¶
Added
lightning.fabric.utilities.ThroughputMonitor
andlightning.fabric.utilities.Throughput
to track throughput and log it (#18848)Added
lightning.fabric.utilities.AttributeDict
for convenient dict-attribute access to represent state in script (#18943)Added support for meta-device initialization and materialization of 4-bit Bitsandbytes layers (#19150)
Added
TransformerEnginePrecision(fallback_compute_dtype=)
to control the dtype of operations that don’t support fp8 (#19082)Added support for clipping gradients by value with FSDP (#19236)
Added a utility function and CLI to consolidate FSDP sharded checkpoints into a single file (#19213)
Added support for re-compiling the model inside
Fabric.setup()
over the FSDP/DDP wrappers (#19280)
[2.2.0] - Changed¶
seed_everything()
without passing in a seed no longer randomly selects a seed, and now defaults to0
(#18846)Changed the
TransformerEnginePrecision(dtype=)
argument toweights_dtype
and made it required (#19082)The columns in the
metrics.csv
file produced byCSVLogger
are now sorted alphabetically (#19159)
[2.2.0] - Removed¶
Removed support for PyTorch 1.12 (#19300)
[2.2.0] - Fixed¶
[2.1.4] - 2024-01-31¶
[2.1.4] - Fixed¶
[2.1.3] - 2023-12-21¶
[2.1.3] - Fixed¶
[2.1.2] - 2023-11-15¶
[2.1.2] - Fixed¶
Fixed precision default from environment (#18928)
[2.1.1] - 2023-11-06¶
[2.1.1] - Changed¶
Calling a method other than
forward
that invokes submodules is now an error when the model is wrapped (e.g., with DDP) (#18819)
[2.1.1] - Fixed¶
[2.1.0] - 2023-10-11¶
[2.1.0] - Added¶
Added support for the TPU-v4 architecture (#17227)
Added support for XLA’s new PJRT runtime (#17352)
Added support for Fully Sharded Data Parallel (FSDP) training with XLA (#18126, #18424, #18430)
Check for invalid TPU device inputs (#17227)
Added
XLAStrategy(sync_module_states=bool)
to control whether to broadcast the parameters to all devices (#17522)Added support for joint setup of model and optimizer with FSDP (#17305)
Added support for handling multiple parameter groups in optimizers set up with FSDP (#17305)
Added support for saving and loading sharded model and optimizer state with
FSDPStrategy
(#17323)Added a warning when calling methods on
_FabricModule
that bypass the strategy-specific wrappers (#17424)Added
Fabric.init_tensor()
context manager to instantiate tensors efficiently directly on device and dtype (#17488)Added
Fabric.init_module()
context manager to instantiate large models efficiently directly on device, dtype, and with sharding support (#17462)Creates the model parameters in the desired dtype (
torch.float32
,torch.float64
,torch.float16
, ortorch.bfloat16
) depending on the ‘true’ precision choice inFabric(precision='32-true'|'64-true'|'16-true'|'bf16-true')
Handles initialization for FSDP models before wrapping and the Zero stage 3 initialization for DeepSpeed before sharding
Added support for empty weight initialization with
Fabric.init_module(empty_init=True)
for checkpoint loading (#17627)Added support for meta-device initialization with
Fabric.init_module(empty_init=True)
in FSDP (#18122)Added
lightning.fabric.plugins.Precision.module_init_context()
andlightning.fabric.strategies.Strategy.module_init_context()
context managers to control model and tensor instantiation (#17462)lightning.fabric.strategies.Strategy.tensor_init_context()
context manager to instantiate tensors efficiently directly on device and dtype (#17607)Run the DDP wrapper in a CUDA stream (#17334)
Added support for true half-precision as
Fabric(precision="16-true"|"bf16-true")
(#17287)Added support for mixed 8-bit precision as
Fabric(precision="transformer-engine")
using Nvidia’s Transformer Engine (#17597)Added support for linear layer quantization with
Fabric(plugins=BitsandbytesPrecision())
using bitsandbytes (#18655)Added error messaging for missed
.launch()
when it is required (#17570)Added support for saving checkpoints with either full state-dict or sharded state dict via
FSDPStrategy(state_dict_type="full"|"sharded")
(#17526)Added support for loading a full-state checkpoint file into a sharded model (#17623)
Added support for calling hooks on a LightningModule via
Fabric.call
(#17874)Added the parameter
Fabric.load(..., strict=True|False)
to enable non-strict loading of partial checkpoint state (#17645)Added the parameter
Fabric.save(..., filter=...)
to enable saving a partial checkpoint state (#17845)Added support for loading optimizer states from a full-state checkpoint file (#17747)
Automatically call
xla_model.mark_step()
before saving checkpoints with XLA (#17882)Automatically call
xla_model.mark_step()
afteroptimizer.step()
with XLA (#17883)Added support for all half-precision modes in FSDP precision plugin (#17807)
Added
FSDPStrategy(activation_checkpointing_policy=...)
to customize the layer policy for automatic activation checkpointing (requires torch>=2.1) (#18045)Added a callback for spike-detection (#18014)
Added the ability to set the
torch.distributed.fsdp.ShardingStrategy
via string inFSDPStrategy
(#18087)Improved error messages when attempting to load a DeepSpeed checkpoint at an invalid path (#17795)
Added
Fabric.load_raw()
for loading raw PyTorch state dict checkpoints for model or optimizer objects (#18049)Allowed accessing rank information in the main process before processes are launched when using the
XLAStrategy
(#18194)Added automatic process cleanup to avoid zombie child processes and stalls when exceptions are raised (#18218)
Added validation of user input for
devices
andnum_nodes
when running withSLURM
orTorchElastic
(#18292)Improved the error messaging and instructions when handling custom batch samplers in distributed settings (#18402)
Added support for saving and loading stateful objects other than modules and optimizers (#18513)
Enabled the default process group configuration for FSDP’s hybrid sharding (#18583)
Added
lightning.fabric.utilities.suggested_max_num_workers
to assist with setting a good value in distributed settings (#18591)Added
lightning.fabric.utilities.is_shared_filesystem
utility function to automatically check whether the filesystem is shared between machines (#18586)Removed support for PyTorch 1.11 (#18691)
Added support for passing the argument
.load_state_dict(..., assign=True|False)
on Fabric-wrapped modules in PyTorch 2.1 or newer (#18690)
[2.1.0] - Changed¶
Allow using iterable-style datasets with TPUs (#17331)
Increased the minimum XLA requirement to 1.13 (#17368)
Fabric argument validation now only raises an error if conflicting settings are set through the CLI (#17679)
DataLoader re-instantiation is now only performed when a distributed sampler is required (#18191)
Improved the formatting of emitted warnings (#18288)
Broadcast and reduction of tensors with XLA-based strategies now preserve the input’s device (#18275)
Due to lack of reliability, Fabric now only runs on one GPU instead of all GPUs in a Jupyter notebook if
devices="auto"
(default) (#18291)Enabled launching via
torchrun
in a SLURM environment; theTorchElasticEnvironment
now gets chosen over theSLURMEnvironment
if both are detected (#18618)If not set by the user, Lightning will set
OMP_NUM_THREADS
tonum_cpus / num_processes
when launching subprocesses (e.g. when DDP is used) to avoid system overload for CPU-intensive tasks (#18677)
[2.1.0] - Deprecated¶
Deprecated the
DDPStrategy.is_distributed
property. This strategy is distributed by definition (#17381)Deprecated the
SingleTPUStrategy
(strategy="single_tpu"
) in favor ofSingleDeviceXLAStrategy
(strategy="single_xla"
) (#17383)Deprecated the
TPUAccelerator
in favor ofXLAAccelerator
(#17383)Deprecated the
TPUPrecision
in favor ofXLAPrecision
(#17383)Deprecated the
TPUBf16Precision
in favor ofXLABf16Precision
(#17383)
[2.1.0] - Removed¶
Removed automatic sharding support with
Fabric.run
or usingfabric.launch(fn)
. This only impacts FSDP and DeepSpeed strategy users. Please instantiate your module under the newly addedfabric.init_module
context manager (#17832)Removed the unsupported
checkpoint_io
argument from theFSDPStrategy
(#18192)
[2.1.0] - Fixed¶
Fixed issue where running on TPUs would select the wrong device index (#17227)
Removed the need to call
.launch()
when using the DP-strategy (strategy="dp"
) (#17931)Fixed FSDP re-applying activation checkpointing when the user had manually applied it already (#18006)
Fixed FSDP re-wrapping the module root when the user had manually wrapped the model (#18054)
Fixed issue where unexpected exceptions would leave the default torch dtype modified when using true precision settings (#18500)
Fixed redundant input-type casting in FSDP precision (#18630)
Fixed an issue with
find_usable_cuda_devices(0)
incorrectly returning a list of devices (#18722)Fixed redundant file writes in
CSVLogger
(#18567)
[2.0.9] - 2023-09-14¶
[2.0.9] - Fixed¶
Fixed an issue causing the
_FabricOptimizer.state
to remain outdated after loading withload_state_dict
(#18488)
[2.0.8] - 2023-08-29¶
[2.0.8] - Changed¶
On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
[2.0.8] - Fixed¶
Fixed model parameters getting shared between processes when running with
strategy="ddp_spawn"
andaccelerator="cpu"
; this has a necessary memory impact, as parameters are replicated for each process now (#18238)Removed false positive warning when using
fabric.no_backward_sync
with XLA strategies (#17761)Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)
Fixed FSDP full-precision
param_dtype
training (16-mixed
,bf16-mixed
and32-true
configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)
[2.0.7] - 2023-08-14¶
[2.0.7] - Changed¶
Disabled the auto-detection of the Kubeflow environment (#18137)
[2.0.7] - Fixed¶
Fixed issue where DDP subprocesses that used Hydra would set hydra’s working directory to current directory (#18145)
Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)
Fixed an issue with
Fabric.all_reduce()
not performing an inplace operation for all backends consistently (#18235)
[2.0.6] - 2023-07-20¶
[2.0.6] - Fixed¶
Fixed
TensorBoardLogger.log_graph
not unwrapping the_FabricModule
(#17844)
[2.0.5] - 2023-07-07¶
[2.0.5] - Added¶
Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952)
[2.0.5] - Changed¶
Avoid info message when loading 0 entry point callbacks (#17990)
[2.0.5] - Fixed¶
Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875)
Fixed check for FSDP’s flat parameters in all parameter groups (#17914)
Fixed automatic step tracking in Fabric’s CSVLogger (#17942)
Fixed an issue causing the
torch.set_float32_matmul_precision
info message to show multiple times (#17960)Fixed loading model state when
Fabric.load()
is called afterFabric.setup()
(#17997)
[2.0.4] - 2023-06-22¶
[2.0.4] - Fixed¶
[2.0.3] - 2023-06-07¶
Added support for
Callback
registration through entry points (#17756)
[2.0.3] - Changed¶
[2.0.3] - Fixed¶
[2.0.2] - 2023-04-24¶
[2.0.2] - Changed¶
Enabled precision autocast for LightningModule step methods in Fabric (#17439)
[2.0.2] - Fixed¶
[2.0.1] - 2023-03-30¶
[2.0.1] - Changed¶
Generalized
Optimizer
validation to accommodate both FSDP 1.x and 2.x (#16733)
[2.0.0] - 2023-03-15¶
[2.0.0] - Added¶
Added
Fabric.all_reduce
(#16459)Added support for saving and loading DeepSpeed checkpoints through
Fabric.save/load()
(#16452)Added support for automatically calling
set_epoch
on thedataloader.batch_sampler.sampler
(#16841)Added support for writing logs to remote file systems with the
CSVLogger
(#16880)Added support for frozen dataclasses in the optimizer state (#16656)
Added
lightning.fabric.is_wrapped
to check whether a module, optimizer, or dataloader was already wrapped by Fabric (#16953)
[2.0.0] - Changed¶
Fabric now chooses
accelerator="auto", strategy="auto", devices="auto"
as defaults (#16842)Checkpoint saving and loading redesign (#16434)
Changed the method signatrue of
Fabric.save
andFabric.load
Changed the method signature of
Strategy.save_checkpoint
andFabric.load_checkpoint
Fabric.save
accepts a state that can contain model and optimizer referencesFabric.load
can now load state in-place onto models and optimizersFabric.load
returns a dictionary of objects that weren’t loaded into the stateStrategy.save_checkpoint
andFabric.load_checkpoint
are now responsible for accessing the state of the model and optimizers
DataParallelStrategy.get_module_state_dict()
andDDPStrategy.get_module_state_dict()
now correctly extracts the state dict without keys prefixed with ‘module’ (#16487)“Native” suffix removal (#16490)
strategy="fsdp_full_shard_offload"
is nowstrategy="fsdp_cpu_offload"
lightning.fabric.plugins.precision.native_amp
is nowlightning.fabric.plugins.precision.amp
Enabled all shorthand strategy names that can be supported in the CLI (#16485)
Renamed
strategy='tpu_spawn'
tostrategy='xla'
andstrategy='tpu_spawn_debug'
tostrategy='xla_debug'
(#16781)Changed arguments for precision settings (from [64|32|16|bf16] to [“64-true”|”32-true”|”16-mixed”|”bf16-mixed”]) (#16767)
The selection
Fabric(strategy="ddp_spawn", ...)
no longer falls back to “ddp” when a cluster environment gets detected (#16780)Renamed
setup_dataloaders(replace_sampler=...)
tosetup_dataloaders(use_distributed_sampler=...)
(#16829)
[2.0.0] - Removed¶
[2.0.0] - Fixed¶
[1.9.4] - 2023-03-01¶
[1.9.4] - Added¶
Added
Fabric(strategy="auto")
support (#16916)
[1.9.4] - Fixed¶
[1.9.3] - 2023-02-21¶
[1.9.3] - Fixed¶
[1.9.2] - 2023-02-15¶
[1.9.2] - Fixed¶
Fixed an attribute error and improved input validation for invalid strategy types being passed to Trainer (#16693)
[1.9.1] - 2023-02-10¶
[1.9.1] - Fixed¶
Fixed error handling for
accelerator="mps"
andddp
strategy pairing (#16455)Fixed strict availability check for
torch_xla
requirement (#16476)Fixed an issue where PL would wrap DataLoaders with XLA’s MpDeviceLoader more than once (#16571)
Fixed the batch_sampler reference for DataLoaders wrapped with XLA’s MpDeviceLoader (#16571)
Fixed an import error when
torch.distributed
is not available (#16658)
[1.9.0] - 2023-01-17¶
[1.9.0] - Added¶
Added
Fabric.launch()
to programmatically launch processes (e.g. in Jupyter notebook) (#14992)Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the
run
method (#14992)Added
Fabric.setup_module()
andFabric.setup_optimizers()
to support strategies that need to set up the model before an optimizer can be created (#15185)Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
Added
lightning.fabric.accelerators.find_usable_cuda_devices
utility function (#16147)Added basic support for LightningModules (#16048)
Added support for managing callbacks via
Fabric(callbacks=...)
and emitting events throughFabric.call()
(#16074)Added Logger support (#16121)
Added
Fabric(loggers=...)
to support different Logger frameworks in FabricAdded
Fabric.log
for logging scalars using multiple loggersAdded
Fabric.log_dict
for logging a dictionary of multiple metrics at onceAdded
Fabric.loggers
andFabric.logger
attributes to access the individual logger instancesAdded support for calling
self.log
andself.log_dict
in a LightningModule when using FabricAdded access to
self.logger
andself.loggers
in a LightningModule when using Fabric
Added
lightning.fabric.loggers.TensorBoardLogger
(#16121)Added
lightning.fabric.loggers.CSVLogger
(#16346)Added support for a consistent
.zero_grad(set_to_none=...)
on the wrapped optimizer regardless of which strategy is used (#16275)
[1.9.0] - Changed¶
The
Fabric.run()
method is no longer abstract (#14992)The
XLAStrategy
now inherits fromParallelStrategy
instead ofDDPSpawnStrategy
(#15838)Merged the implementation of
DDPSpawnStrategy
intoDDPStrategy
and removedDDPSpawnStrategy
(#14952)The dataloader wrapper returned from
.setup_dataloaders()
now calls.set_epoch()
on the distributed sampler if one is used (#16101)Renamed
Strategy.reduce
toStrategy.all_reduce
in all strategies (#16370)When using multiple devices, the strategy now defaults to “ddp” instead of “ddp_spawn” when none is set (#16388)
[1.9.0] - Removed¶
Removed support for FairScale’s sharded training (
strategy='ddp_sharded'|'ddp_sharded_spawn'
). Use Fully-Sharded Data Parallel instead (strategy='fsdp'
) (#16329)
[1.9.0] - Fixed¶
[1.8.6] - 2022-12-21¶
minor cleaning
[1.8.5] - 2022-12-15¶
minor cleaning
[1.8.4] - 2022-12-08¶
[1.8.4] - Fixed¶
Fixed
shuffle=False
having no effect when using DDP/DistributedSampler (#15931)
[1.8.3] - 2022-11-22¶
[1.8.3] - Changed¶
Temporarily removed support for Hydra multi-run (#15737)
[1.8.2] - 2022-11-17¶
[1.8.2] - Fixed¶
Fixed the automatic fallback from
LightningLite(strategy="ddp_spawn", ...)
toLightningLite(strategy="ddp", ...)
when on an LSF cluster (#15103)