Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
[2.0.6] - 2023-07-20¶
Fixed¶
Fixed
TensorBoardLogger.log_graph
not unwrapping the_FabricModule
(#17844)
[2.0.5] - 2023-07-07¶
[2.0.5] - Added¶
Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952)
[2.0.5] - Changed¶
Avoid info message when loading 0 entry point callbacks (#17990)
[2.0.5] - Fixed¶
Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875)
Fixed check for FSDP’s flat parameters in all parameter groups (#17914)
Fixed automatic step tracking in Fabric’s CSVLogger (#17942)
Fixed an issue causing the
torch.set_float32_matmul_precision
info message to show multiple times (#17960)Fixed loading model state when
Fabric.load()
is called afterFabric.setup()
(#17997)
[2.0.4] - 2023-06-22¶
[2.0.4] - Fixed¶
[2.0.3] - 2023-06-07¶
Added support for
Callback
registration through entry points (#17756)Add Fabric internal hooks (#17759)
[2.0.3] - Changed¶
[2.0.3] - Fixed¶
[2.0.2] - 2023-04-24¶
[2.0.2] - Changed¶
Enable precision autocast for LightningModule step methods in Fabric (#17439)
[2.0.2] - Fixed¶
[2.0.1] - 2023-03-30¶
[2.0.1] - Changed¶
Generalized
Optimizer
validation to accommodate both FSDP 1.x and 2.x (#16733)
[2.0.0] - 2023-03-15¶
[2.0.0] - Added¶
Added
Fabric.all_reduce
(#16459)Added support for saving and loading DeepSpeed checkpoints through
Fabric.save/load()
(#16452)Added support for automatically calling
set_epoch
on thedataloader.batch_sampler.sampler
(#16841)Added support for writing logs to remote file systems with the
CSVLogger
(#16880)Added support for frozen dataclasses in the optimizer state (#16656)
Added
lightning.fabric.is_wrapped
to check whether a module, optimizer, or dataloader was already wrapped by Fabric (#16953)
[2.0.0] - Changed¶
Fabric now chooses
accelerator="auto", strategy="auto", devices="auto"
as defaults (#16842)Checkpoint saving and loading redesign (#16434)
Changed the method signatrue of
Fabric.save
andFabric.load
Changed the method signature of
Strategy.save_checkpoint
andFabric.load_checkpoint
Fabric.save
accepts a state that can contain model and optimizer referencesFabric.load
can now load state in-place onto models and optimizersFabric.load
returns a dictionary of objects that weren’t loaded into the stateStrategy.save_checkpoint
andFabric.load_checkpoint
are now responsible for accessing the state of the model and optimizers
DataParallelStrategy.get_module_state_dict()
andDDPStrategy.get_module_state_dict()
now correctly extracts the state dict without keys prefixed with ‘module’ (#16487)“Native” suffix removal (#16490)
strategy="fsdp_full_shard_offload"
is nowstrategy="fsdp_cpu_offload"
lightning.fabric.plugins.precision.native_amp
is nowlightning.fabric.plugins.precision.amp
Enabled all shorthand strategy names that can be supported in the CLI (#16485)
Renamed
strategy='tpu_spawn'
tostrategy='xla'
andstrategy='tpu_spawn_debug'
tostrategy='xla_debug'
(#16781)Changed arguments for precision settings (from [64|32|16|bf16] to [“64-true”|”32-true”|”16-mixed”|”bf16-mixed”]) (#16767)
The selection
Fabric(strategy="ddp_spawn", ...)
no longer falls back to “ddp” when a cluster environment gets detected (#16780)Renamed
setup_dataloaders(replace_sampler=...)
tosetup_dataloaders(use_distributed_sampler=...)
(#16829)
[2.0.0] - Removed¶
[2.0.0] - Fixed¶
[1.9.4] - 2023-03-01¶
[1.9.4] - Added¶
Added
Fabric(strategy="auto")
support (#16916)
[1.9.4] - Fixed¶
[1.9.3] - 2023-02-21¶
[1.9.3] - Fixed¶
[1.9.2] - 2023-02-15¶
[1.9.2] - Fixed¶
Fixed an attribute error and improved input validation for invalid strategy types being passed to Trainer (#16693)
[1.9.1] - 2023-02-10¶
[1.9.1] - Fixed¶
Fixed error handling for
accelerator="mps"
andddp
strategy pairing (#16455)Fixed strict availability check for
torch_xla
requirement (#16476)Fixed an issue where PL would wrap DataLoaders with XLA’s MpDeviceLoader more than once (#16571)
Fixed the batch_sampler reference for DataLoaders wrapped with XLA’s MpDeviceLoader (#16571)
Fixed an import error when
torch.distributed
is not available (#16658)
[1.9.0] - 2023-01-17¶
[1.9.0] - Added¶
Added
Fabric.launch()
to programmatically launch processes (e.g. in Jupyter notebook) (#14992)Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the
run
method (#14992)Added
Fabric.setup_module()
andFabric.setup_optimizers()
to support strategies that need to set up the model before an optimizer can be created (#15185)Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
Added
lightning.fabric.accelerators.find_usable_cuda_devices
utility function (#16147)Added basic support for LightningModules (#16048)
Added support for managing callbacks via
Fabric(callbacks=...)
and emitting events throughFabric.call()
(#16074)Added Logger support (#16121)
Added
Fabric(loggers=...)
to support different Logger frameworks in FabricAdded
Fabric.log
for logging scalars using multiple loggersAdded
Fabric.log_dict
for logging a dictionary of multiple metrics at onceAdded
Fabric.loggers
andFabric.logger
attributes to access the individual logger instancesAdded support for calling
self.log
andself.log_dict
in a LightningModule when using FabricAdded access to
self.logger
andself.loggers
in a LightningModule when using Fabric
Added
lightning.fabric.loggers.TensorBoardLogger
(#16121)Added
lightning.fabric.loggers.CSVLogger
(#16346)Added support for a consistent
.zero_grad(set_to_none=...)
on the wrapped optimizer regardless of which strategy is used (#16275)
[1.9.0] - Changed¶
The
Fabric.run()
method is no longer abstract (#14992)The
XLAStrategy
now inherits fromParallelStrategy
instead ofDDPSpawnStrategy
(#15838)Merged the implementation of
DDPSpawnStrategy
intoDDPStrategy
and removedDDPSpawnStrategy
(#14952)The dataloader wrapper returned from
.setup_dataloaders()
now calls.set_epoch()
on the distributed sampler if one is used (#16101)Renamed
Strategy.reduce
toStrategy.all_reduce
in all strategies (#16370)When using multiple devices, the strategy now defaults to “ddp” instead of “ddp_spawn” when none is set (#16388)
[1.9.0] - Removed¶
Removed support for FairScale’s sharded training (
strategy='ddp_sharded'|'ddp_sharded_spawn'
). Use Fully-Sharded Data Parallel instead (strategy='fsdp'
) (#16329)
[1.9.0] - Fixed¶
[1.8.6] - 2022-12-21¶
minor cleaning
[1.8.5] - 2022-12-15¶
minor cleaning
[1.8.4] - 2022-12-08¶
[1.8.4] - Fixed¶
Fixed
shuffle=False
having no effect when using DDP/DistributedSampler (#15931)
[1.8.3] - 2022-11-22¶
[1.8.3] - Changed¶
Temporarily removed support for Hydra multi-run (#15737)
[1.8.2] - 2022-11-17¶
[1.8.2] - Fixed¶
Fixed the automatic fallback from
LightningLite(strategy="ddp_spawn", ...)
toLightningLite(strategy="ddp", ...)
when on an LSF cluster (#15103)