Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
[1.4.8] - 2021-09-22¶
- Fixed error reporting in DDP process reconciliation when processes are launched by an external agent (#9389) 
- Added PL_RECONCILE_PROCESS environment variable to enable process reconciliation regardless of cluster environment settings (#9389) 
- Fixed - add_argparse_argsraising- TypeErrorwhen args are typed as- typing.Genericin Python 3.6 (#9554)
- Fixed back-compatibility for saving hyperparameters from a single container and inferring its argument name by reverting #9125 (#9642) 
[1.4.7] - 2021-09-14¶
[1.4.6] - 2021-09-07¶
- Fixed an issues with export to ONNX format when a model has multiple inputs (#8800) 
- Removed deprecation warnings being called for - on_{task}_dataloader(#9279)
- Fixed save/load/resume from checkpoint for DeepSpeed Plugin ( #8397, #8644, #8627) 
- Fixed - EarlyStoppingrunning on train epoch end when- check_val_every_n_epoch>1is set (#9156)
- Fixed an issue with logger outputs not being finalized correctly after prediction runs (#8333) 
- Fixed the Apex and DeepSpeed plugin closure running after the - on_before_optimizer_stephook (#9288)
- Fixed the Native AMP plugin closure not running with manual optimization (#9288) 
- Fixed bug where data-loading functions where not getting the correct running stage passed (#8858) 
- Fixed intra-epoch evaluation outputs staying in memory when the respective - *_epoch_endhook wasn’t overridden (#9261)
- Fixed error handling in DDP process reconciliation when - _sync_dirwas not initialized (#9267)
- Fixed PyTorch Profiler not enabled for manual optimization (#9316) 
- Fixed inspection of other args when a container is specified in - save_hyperparameters(#9125)
- Fixed signature of - Timer.on_train_epoch_endand- StochasticWeightAveraging.on_train_epoch_endto prevent unwanted deprecation warnings (#9347)
[1.4.5] - 2021-08-31¶
- Fixed reduction using - self.log(sync_dict=True, reduce_fx={mean,max})(#9142)
- Fixed not setting a default value for - max_epochsif- max_timewas specified on the- Trainerconstructor (#9072)
- Fixed the CometLogger, no longer modifies the metrics in place. Instead creates a copy of metrics before performing any operations (#9150) 
- Fixed - DDP“CUDA error: initialization error” due to a- copyinstead of- deepcopyon- ResultCollection(#9239)
[1.4.4] - 2021-08-24¶
[1.4.3] - 2021-08-17¶
- Fixed plateau scheduler stepping on incomplete epoch (#8861) 
- Fixed infinite loop with - CycleIteratorand multiple loaders (#8889)
- Fixed - StochasticWeightAveragingwith a list of learning rates not applying them to each param group (#8747)
- Restore original loaders if replaced by entrypoint (#8885) 
- Fixed lost reference to - _Metadataobject in- ResultMetricCollection(#8932)
- Ensure the existence of - DDPPlugin._sync_dirin- reconciliate_processes(#8939)
[1.4.2] - 2021-08-10¶
- Fixed recursive call for - apply_to_collection(include_none=False)(#8719)
- Fixed truncated backprop through time enablement when set as a property on the LightningModule and not the Trainer (#8804) 
- Fixed comments and exception message for metrics_to_scalars (#8782) 
- Fixed typo error in LightningLoggerBase.after_save_checkpoint docstring (#8737) 
[1.4.1] - 2021-08-03¶
- Fixed - trainer.fit_loop.split_idxalways returning- None(#8601)
- Fixed references for - ResultCollection.extra(#8622)
- Fixed reference issues during epoch end result collection (#8621) 
- Fixed horovod auto-detection when horovod is not installed and the launcher is - mpirun(#8610)
- Fixed an issue with - training_stepoutputs not getting collected correctly for- training_epoch_end(#8613)
- Fixed distributed types support for CPUs (#8667) 
- Fixed a deadlock issue with DDP and torchelastic (#8655) 
- Fixed - accelerator=ddpchoice for CPU (#8645)
[1.4.0] - 2021-07-27¶
[1.4.0] - Added¶
- Added - extract_batch_sizeutility and corresponding tests to extract batch dimension from multiple batch types. (#8357)
- Added support for named parameter groups in - LearningRateMonitor(#7987)
- Added - dataclasssupport for- pytorch_lightning.utilities.apply_to_collection(#7935)
- Added support to - LightningModule.to_torchscriptfor saving to custom filesystems with- fsspec(#7617)
- Added - KubeflowEnvironmentfor use with the- PyTorchJoboperator in Kubeflow
- Added LightningCLI support for config files on object stores (#7521) 
- Added - ModelPruning(prune_on_train_epoch_end=True|False)to choose when to apply pruning (#7704)
- Added support for checkpointing based on a provided time interval during training (#7515) 
- Progress tracking 
- Added support for passing a - LightningDataModulepositionally as the second argument to- trainer.{validate,test,predict}(#7431)
- Added argument - trainer.predict(ckpt_path)(#7430)
- Added - clip_grad_by_valuesupport for TPUs (#7025)
- Added support for passing any class to - is_overridden(#7918)
- Added - sub_dirparameter to- TensorBoardLogger(#6195)
- Added correct - dataloader_idxto batch transfer hooks (#6241)
- Added - include_none=boolargument to- apply_to_collection(#7769)
- Added - apply_to_collectionsto apply a function to two zipped collections (#7769)
- Added - ddp_fully_shardedsupport (#7487)
- Added - should_rank_save_checkpointproperty to Training Plugins (#7684)
- Added - log_grad_normhook to- LightningModuleto customize the logging of gradient norms (#7873)
- Added - save_config_filenameinit argument to- LightningCLIto ease resolving name conflicts (#7741)
- Added - save_config_overwriteinit argument to- LightningCLIto ease overwriting existing config files (#8059)
- Added reset dataloader hooks to Training Plugins and Accelerators (#7861) 
- Added trainer stage hooks for Training Plugins and Accelerators (#7864) 
- Added the - on_before_optimizer_stephook (#8048)
- Added IPU Accelerator (#7867) 
- Fault-tolerant training - Added - {,load_}state_dictto- ResultCollection(#7948)
- Added - {,load_}state_dictto- Loops(#8197)
- Set - Loop.restarting=Falseat the end of the first iteration (#8362)
- Save the loops state with the checkpoint (opt-in) (#8362) 
- Save a checkpoint to restore the state on exception (opt-in) (#8362) 
- Added - state_dictand- load_state_dictutilities for- CombinedLoader+ utilities for dataloader (#8364)
 
- Added - rank_zero_onlyto- LightningModule.logfunction (#7966)
- Added - metric_attributeto- LightningModule.logfunction (#7966)
- Added a warning if - Trainer(log_every_n_steps)is a value too high for the training dataloader (#7734)
- Added LightningCLI support for argument links applied on instantiation (#7895) 
- Added LightningCLI support for configurable callbacks that should always be present (#7964) 
- Added DeepSpeed Infinity Support, and updated to DeepSpeed 0.4.0 (#7234) 
- Added support for - torch.nn.UninitializedParameterin- ModelSummary(#7642)
- Added support - LightningModule.save_hyperparameterswhen- LightningModuleis a dataclass (#7992)
- Added support for overriding - optimizer_zero_gradand- optimizer_stepwhen using accumulate_grad_batches (#7980)
- Added - loggerboolean flag to- save_hyperparameters(#7960)
- Added support for calling scripts using the module syntax ( - python -m package.script) (#8073)
- Added support for optimizers and learning rate schedulers to - LightningCLI(#8093)
- Added XLA Profiler (#8014) 
- Added - PrecisionPlugin.{pre,post}_backward(#8328)
- Added - on_load_checkpointand- on_save_checkpointhooks to the- PrecisionPluginbase class (#7831)
- Added - max_depthparameter in- ModelSummary(#8062)
- Added - XLAStatsMonitorcallback (#8235)
- Added - restorefunction and- restartingattribute to base- Loop(#8247)
- Added - FastForwardSamplerand- CaptureIterableDataset(#8307)
- Added support for - save_hyperparametersin- LightningDataModule(#3792)
- Added the - ModelCheckpoint(save_on_train_epoch_end)to choose when to run the saving logic (#8389)
- Added - LSFEnvironmentfor distributed training with the LSF resource manager- jsrun(#5102)
- Added support for - accelerator='cpu'|'gpu'|'tpu'|'ipu'|'auto'(#7808)
- Added - tpu_spawn_debugto plugin registry (#7933)
- Enabled traditional/manual launching of DDP processes through - LOCAL_RANKand- NODE_RANKenvironment variable assignments (#7480)
- Added - quantize_on_fit_endargument to- QuantizationAwareTraining(#8464)
- Added experimental support for loop specialization (#8226) 
- Added support for - devicesflag to Trainer (#8440)
- Added private - prevent_trainer_and_dataloaders_deepcopycontext manager on the- LightningModule(#8472)
- Added support for providing callables to the Lightning CLI instead of types (#8400) 
[1.4.0] - Changed¶
- Decoupled device parsing logic from Accelerator connector to Trainer (#8180) 
- Changed the - Trainer’s- checkpoint_callbackargument to allow only boolean values (#7539)
- Log epoch metrics before the - on_evaluation_endhook (#7272)
- Explicitly disallow calling - self.log(on_epoch=False)during epoch-only or single-call hooks (#7874)
- Changed these - Trainermethods to be protected:- call_setup_hook,- call_configure_sharded_model,- pre_dispatch,- dispatch,- post_dispatch,- call_teardown_hook,- run_train,- run_sanity_check,- run_evaluate,- run_evaluation,- run_predict,- track_output_for_epoch_end
- Changed - metrics_to_scalarsto work with any collection or value (#7888)
- Changed - clip_grad_normto use- torch.nn.utils.clip_grad_norm_(#7025)
- Validation is now always run inside the training epoch scope (#7357) 
- ModelCheckpointnow runs at the end of the training epoch by default (#8389)
- EarlyStoppingnow runs at the end of the training epoch by default (#8286)
- Refactored Loops - Moved attributes - global_step,- current_epoch,- max/min_steps,- max/min_epochs,- batch_idx, and- total_batch_idxto TrainLoop (#7437)
- Refactored result handling in training loop (#7506) 
- Moved attributes - hiddensand- split_idxto TrainLoop (#7507)
- Refactored the logic around manual and automatic optimization inside the optimizer loop (#7526) 
- Simplified “should run validation” logic (#7682) 
- Simplified logic for updating the learning rate for schedulers (#7682) 
- Removed the - on_epochguard from the “should stop” validation check (#7701)
- Refactored internal loop interface; added new classes - FitLoop,- TrainingEpochLoop,- TrainingBatchLoop(#7871, #8077)
- Removed - pytorch_lightning/trainer/training_loop.py(#7985)
- Refactored evaluation loop interface; added new classes - DataLoaderLoop,- EvaluationLoop,- EvaluationEpochLoop(#7990, #8077)
- Removed - pytorch_lightning/trainer/evaluation_loop.py(#8056)
- Restricted public access to several internal functions (#8024) 
- Refactored trainer - _run_*functions and separate evaluation loops (#8065)
- Refactored prediction loop interface; added new classes - PredictionLoop,- PredictionEpochLoop(#7700, #8077)
- Removed - pytorch_lightning/trainer/predict_loop.py(#8094)
- Moved result teardown to the loops (#8245) 
- Improve - LoopAPI to better handle children- state_dictand- progress(#8334)
 
- Refactored logging - Renamed and moved - core/step_result.pyto- trainer/connectors/logger_connector/result.py(#7736)
- Dramatically simplify the - LoggerConnector(#7882)
- trainer.{logged,progress_bar,callback}_metricsare now updated on-demand (#7882)
- Completely overhaul the - Resultobject in favor of- ResultMetric(#7882)
- Improve epoch-level reduction time and overall memory usage (#7882) 
- Allow passing - self.log(batch_size=...)(#7891)
- Each of the training loops now keeps its own results collection (#7891) 
- Remove - EpochResultStoreand- HookResultStorein favor of- ResultCollection(#7909)
- Remove - MetricsHolder(#7909)
 
- Moved - ignore_scalar_return_in_dpwarning suppression to the DataParallelPlugin class (#7421)
- Changed the behaviour when logging evaluation step metrics to no longer append - /epoch_*to the metric name (#7351)
- Raised - ValueErrorwhen a- Nonevalue is- self.log-ed (#7771)
- Changed - resolve_training_type_pluginsto allow setting- num_nodesand- sync_batchnormfrom- Trainersetting (#7026)
- Default - seed_everything(workers=True)in the- LightningCLI(#7504)
- Changed - model.state_dict()in- CheckpointConnectorto allow- training_type_pluginto customize the model’s- state_dict()(#7474)
- MLflowLoggernow uses the env variable- MLFLOW_TRACKING_URIas default tracking URI (#7457)
- Changed - Trainerarg and functionality from- reload_dataloaders_every_epochto- reload_dataloaders_every_n_epochs(#5043)
- Changed - WandbLogger(log_model={True/'all'})to log models as artifacts (#6231)
- MLFlowLogger now accepts - run_nameas an constructor argument (#7622)
- Changed - teardown()in- Acceleratorto allow- training_type_pluginto customize- teardownlogic (#7579)
- Trainer.fitnow raises an error when using manual optimization with unsupported features such as- gradient_clip_valor- accumulate_grad_batches(#7788)
- Accelerator hooks are called regardless if - LightningModuleoverrides the same hooks (#7826)
- Moved profilers to their own file (#7822) 
- The - on_after_backwardhook is now called on accumulating iterations. Use the- on_before_optimizer_stephook to mimic the old behaviour (#8328)
- The mixed precision loss is no longer unscaled before the - on_after_backwardhook. Use the- on_before_optimizer_stephook to mimic the old behaviour (#8328)
- The - TrainingTypePlugin.{pre,post}_backwardhooks no longer take the- optimizer, opt_idx, should_accumulatearguments (#8328)
- The - PrecisionPlugin.backwardhooks no longer returns a value (#8328)
- The - PrecisionPlugin.backwardhooks no longer takes a- should_accumulateargument (#8328)
- Added the - on_before_backwardhook (#7865)
- LightningCLInow aborts with a clearer message if config already exists and disables save config during- fast_dev_run(#7963)
- Saved the - LightningCLIconfig on- setupand only on the main process (#8017)
- Dropped the - LightningCLI- ArgumentParserwhen pickling (#8017)
- Skip - broadcastif distributed not initialized for the spawn plugins (#8017)
- Trainer(resume_from_checkpoint=...)now restores the model directly after- LightningModule.setup(), which is before- LightningModule.configure_sharded_model()(#7652)
- Moved - torch.cuda.set_device()to enable collective calls earlier in setup (#8312)
- Used XLA utility API to move data to CPU (Single TPU core) (#8078) 
- Improved error messages in - replace_samplerwhen the- DataLoaderattributes are not included in the signature or the signature is missing optional arguments (#8519)
- Moved - DeviceDtypeModuleMixinand- HyperparametersMixinmixin to- core(#8396)
- Return the - default_root_diras the- log_dirwhen the logger is a- LoggerCollection(#8187)
[1.4.0] - Deprecated¶
- Deprecated - LightningModule.loaded_optimizer_states_dict(#8229)
- Standardized the dataloaders arguments of - trainer.{fit,valdiate,test,tune}(#7431)
- Deprecated - DataModuleproperties:- has_prepared_data,- has_setup_fit,- has_setup_validate,- has_setup_test,- has_setup_predict,- has_teardown_fit,- has_teardown_validate,- has_teardown_test,- has_teardown_predict(#7657)
- Deprecated - TrainerModelHooksMixinin favor of- pytorch_lightning.utilities.signature_utils(#7422)
- Deprecated - num_nodesand- sync_batchnormarguments in- DDPPluginand- DDPSpawnPlugin(#7026)
- Deprecated - self.log(sync_dist_op)in favor of- self.log(reduce_fx). (#7891)
- Deprecated - is_overridden(model=...)in favor of- is_overridden(instance=...)(#7918)
- Deprecated automatically detaching returned extras with grads (#7994) 
- Deprecated default value of - monitorargument in EarlyStopping callback to enforce- monitoras a required argument (#7907)
- Deprecated importing - rank_zero_{warn,deprecation}directly from- pytorch_lightning.utilities.distributed(#8085)
- Deprecated the use of - CheckpointConnector.hpc_load()in favor of- CheckpointConnector.restore()(#7652)
- Deprecated - ModelCheckpoint(every_n_val_epochs)in favor of- ModelCheckpoint(every_n_epochs)(#8383)
- Deprecated - DDPPlugin.task_idxin favor of- DDPPlugin.local_rank(#8203)
- Deprecated the - Trainer.train_loopproperty in favor of- Trainer.fit_loop(#8025)
- Deprecated the - Trainer.disable_validationproperty in favor of- not Trainer.enable_validation(#8291)
- Deprecated - modeparameter in- ModelSummaryin favor of- max_depth(#8062)
- Deprecated - reload_dataloaders_every_epochargument of- Trainerin favor of- reload_dataloaders_every_n_epochs(#5043)
- Deprecated - distributed_backendargument for- Trainer(#8575)
[1.4.0] - Removed¶
- Dropped official support/testing for PyTorch <1.6 (#8288) 
- Removed - ProfilerConnector(#7654)
- Pruned deprecated classif. metrics from - pytorch_lightning.metrics.functional.classification(#7499)
- Removed deprecated data parallel classes - LightningDataParalleland- LightningDistributedDataParallelfrom- pytorch_lightning.overrides.data_parallel(#7510)
- Removed deprecated trainer attributes - - get_modeland- accelerator_backend(#7502)
- Removed support for automatically monitoring the - val_losskey with- ModelCheckpoint. Pass your- monitorof choice to the- ModelCheckpointinstance instead (#8293)
- Removed support for - self.log(tbptt_reduce_fx)and- self.log(tbptt_pad_token). Please, open a discussion explaining your use-case if you relied on these. (#7644)
- Removed deprecated utils modules - model_utils,- warning_utils,- xla_device_utilsand partially- argparse_utils(#7503)
- Removed - RPCPluginand- RPCSequentialPlugin. If you were successfully using these plugins, please open a GitHub discussion about your use case (#8101)
- Removed deprecated trainer attributes - - on_cpu,- on_tpu,- use_tpu,- on_gpu,- use_dp,- use_ddp,- use_ddp2,- use_horovod,- use_single_gpu(#7501)
- Removed deprecated - optimizerargument in- LightningModule.manual_backward(); Toggling optimizers in manual optimization should be done using- LightningModule.{un}toggle_optimizer()(#8287)
- Removed DeepSpeed FP16 Exception as FP32 is now supported (#8462) 
- Removed environment variable - PL_EXP_VERSIONfrom DDP subprocesses (7403)
[1.4.0] - Fixed¶
- Fixed the - GPUStatsMonitorcallbacks to use the correct GPU IDs if- CUDA_VISIBLE_DEVICESset (#8260)
- Fixed - lr_schedulercheckpointed state by calling- update_lr_schedulersbefore saving checkpoints (#7877)
- Fixed ambiguous warning when both overfit and train dataloader shuffling are enabled (#7685) 
- Fixed dev debugger memory growing due to tracking events even when disabled (#7875) 
- Fixed - Noneloss keys getting added in- training_epoch_endwhen using manual optimization and not returning a loss (#7772)
- Fixed a bug where - precision=64with- accelerator='ddp_spawn'would throw a pickle error (#6924)
- Do not override the existing - epochvalue in- logged_metricswhen already logged by the user (#7982)
- Support for manual optimization with DeepSpeed (#7970) 
- Fixed - dataloader_idxargument value when predicting with only one- DataLoader(#7941)
- Fixed passing the - stageargument of- Callback.{setup,teardown}as a keyword (#7973)
- Fixed metrics generated during - validation sanity checkingare cleaned on end (#8171)
- Fixed - log_gpu_memorymetrics not being added to- loggingwhen nothing else is logged (#8174)
- Fixed a bug where calling - logwith a- Metricinstance would raise an error if it was a nested attribute of the model (#8181)
- Fixed a bug where using - precision=64would cause buffers with complex dtype to be cast to real (#8208)
- Fixed - is_overriddenreturning true for wrapped functions with no changes (#8296)
- Fixed a bug where - truncated_bptt_stepswould throw an AttributeError when the target RNN has multiple hidden states (#8145)
- Fixed - self.optimizers()not returning a single optimizer if it had been wrapped (#8326)
- Fixed the - on_after_backwardhook not getting called when using manual optimization and no plugins (#8328)
- Fixed the - LightningModule.backwardhook only getting called with the- apexplugin when using manual optimization (#8328)
- Fixed moving batch to device before sending it to the - on_*_batch_start/- on_*_batch_endcallbacks and model hooks (#7378)
- Fixed passing a custom - DDPPluginwhen choosing- accelerator="ddp_cpu"for the accelerator (#6208)
- Fixed missing call to - LightningModule.untoggle_optimizerin training loop when running gradient accumulation with multiple optimizers (#8284)
- Fixed hash of LightningEnum to work with value instead of name (#8421). 
- Fixed a bug where an extra checkpoint was saved at the end of training if the - val_check_intervaldid not align with the number of training batches (#7724)
- Fixed hash of LightningEnum to work with value instead of name(#8421). 
- Fixed - move_data_to_deviceto return the batch if the object- tofunction didn’t return- self(#8433)
- Fixed progress bar updates for Pod Training (#8258) 
- Fixed clearing dataloader references before attaching new dataloaders in consecutive `Trainer.{fit,validate,test,predict}´ runs (#8442) 
- Fixed memory leaks on GPU by moving - optimizer_states,- ResultCollection.extra,- ResultMetricattributes, and- LoggerConnectormetrics to- cpu. Also, delete the DDP wrapper on- teardown(#8490)
- Fixed - SWAcallback using LightningModule- prevent_trainer_and_dataloaders_deepcopyto avoid OOM (#8472)
- Fixed - ModelPruningcallback- on_save_checkpointto avoid making a- deepcopypotentially leading to OOM (#8472)
- Fixed the sampler replacement logic for - DataLoaders which do not define all- DataLoaderattributes as- __init__parameters (#8519)
- Fixed DeepSpeed Windows support (#8488) 
- Fixed DeepSpeed not properly setting the trainer - lr_schedulersattribute (#8527)
- Fixed experiment version and log-dir divergence in DDP when using multiple - Trainerinstances in sequence (7403)
- Enabled manual optimization for TPUs (#8458) 
- Fixed - accumulate_grad_batchesnot been recomputed during model reload (#5334)
- Fixed a - TypeErrorwhen wrapping optimizers in the- HorovodPluginand running- Trainer.test(#7840)
- Fixed - BackboneFinetuningrestoration (#8501)
- Fixed - lr_schedulerwith metric (e.g.- torch.optim.lr_scheduler.ReduceLROnPlateau) when using- automatic_optimization = False(#7643)
- Fixed - DeepSpeedbreaking with no schedulers (#8580)
[1.3.8] - 2021-07-01¶
[1.3.8] - Fixed¶
- Fixed a sync deadlock when checkpointing a - LightningModulethat uses a torchmetrics 0.4- Metric(#8218)
- Fixed compatibility TorchMetrics v0.4 (#8206) 
- Added torchelastic check when sanitizing GPUs (#8095) 
- Fixed a DDP info message that was never shown (#8111) 
- Fixed metrics deprecation message at module import level (#8163) 
- Fixed a bug where an infinite recursion would be triggered when using the - BaseFinetuningcallback on a model that contains a- ModuleDict(#8170)
- Added a mechanism to detect - deadlockfor- DDPwhen only 1 process trigger an- Exception. The mechanism will- kill the processeswhen it happens (#8167)
- Fixed NCCL error when selecting non-consecutive device ids (#8165) 
- Fixed SWA to also work with - IterableDataset(#8172)
[1.3.7] - 2021-06-22¶
[1.3.7] - Fixed¶
- Fixed a bug where skipping an optimizer while using amp causes amp to trigger an assertion error (#7975) 
- Fixed deprecation messages not showing due to incorrect stacklevel (#8002, #8005) 
- Fixed setting a - DistributedSamplerwhen using a distributed plugin in a custom accelerator (#7814)
- Improved - PyTorchProfilerchrome traces names (#8009)
- Fixed moving the best score to device in - EarlyStoppingcallback for TPU devices (#7959)
- Fixes access to - callback_metricsin ddp_spawn (#7916)
[1.3.6] - 2021-06-15¶
[1.3.6] - Fixed¶
- Fixed logs overwriting issue for remote filesystems (#7889) 
- Fixed - DataModule.prepare_datacould only be called on the global rank 0 process (#7945)
- Fixed setting - worker_init_fnto seed dataloaders correctly when using DDP (#7942)
- Fixed - BaseFinetuningcallback to properly handle parent modules w/ parameters (#7931)
[1.3.5] - 2021-06-08¶
[1.3.5] - Added¶
- Added warning to Training Step output (#7779) 
[1.3.5] - Fixed¶
[1.3.5] - Changed¶
- Move - training_outputvalidation to after- train_step_end(#7868)
[1.3.4] - 2021-06-01¶
[1.3.4] - Fixed¶
[1.3.3] - 2021-05-27¶
[1.3.3] - Changed¶
- Changed calling of - untoggle_optimizer(opt_idx)out of the closure function (#7563)
[1.3.3] - Fixed¶
- Fixed - ProgressBarpickling after calling- trainer.predict(#7608)
- Fixed broadcasting in multi-node, multi-gpu DDP using torch 1.7 (#7592) 
- Fixed dataloaders are not reset when tuning the model (#7566) 
- Fixed print errors in - ProgressBarwhen- trainer.fitis not called (#7674)
- Fixed global step update when the epoch is skipped (#7677) 
- Fixed training loop total batch counter when accumulate grad batches was enabled (#7692) 
[1.3.2] - 2021-05-18¶
[1.3.2] - Changed¶
- DataModules now avoid duplicate- {setup,teardown,prepare_data}calls for the same stage (#7238)
[1.3.2] - Fixed¶
- Fixed parsing of multiple training dataloaders (#7433) 
- Fixed recursive passing of - wrong_typekeyword argument in- pytorch_lightning.utilities.apply_to_collection(#7433)
- Fixed setting correct - DistribTypefor- ddp_cpu(spawn) backend (#7492)
- Fixed incorrect number of calls to LR scheduler when - check_val_every_n_epoch > 1(#7032)
[1.3.1] - 2021-05-11¶
[1.3.1] - Fixed¶
[1.3.0] - 2021-05-06¶
[1.3.0] - Added¶
- Added support for the - EarlyStoppingcallback to run at the end of the training epoch (#6944)
- Added synchronization points before and after - setuphooks are run (#7202)
- Added a - teardownhook to- ClusterEnvironment(#6942)
- Added utils for metrics to scalar conversions (#7180) 
- Added utils for NaN/Inf detection for gradients and parameters (#6834) 
- Added more explicit exception message when trying to execute - trainer.test()or- trainer.validate()with- fast_dev_run=True(#6667)
- Added - LightningCLIclass to provide simple reproducibility with minimum boilerplate training CLI ( #4492, #6862, #7156, #7299)
- Added - gradient_clip_algorithmargument to Trainer for gradient clipping by value (#6123).
- Added a way to print to terminal without breaking up the progress bar (#5470) 
- Added support to checkpoint after training steps in - ModelCheckpointcallback (#6146)
- Added - TrainerStatus.{INITIALIZING,RUNNING,FINISHED,INTERRUPTED}(#7173)
- Added - Trainer.validate()method to perform one evaluation epoch over the validation set (#4948)
- Added - LightningEnvironmentfor Lightning-specific DDP (#5915)
- Added - teardown()hook to LightningDataModule (#4673)
- Added - auto_insert_metric_nameparameter to- ModelCheckpoint(#6277)
- Added arg to - self.logthat enables users to give custom names when dealing with multiple dataloaders (#6274)
- Added - teardownmethod to- BaseProfilerto enable subclasses defining post-profiling steps outside of- __del__(#6370)
- Added - setupmethod to- BaseProfilerto enable subclasses defining pre-profiling steps for every process (#6633)
- Added no return warning to predict (#6139) 
- Added - Trainer.predictconfig validation (#6543)
- Added - AbstractProfilerinterface (#6621)
- Added support for including module names for forward in the autograd trace of - PyTorchProfiler(#6349)
- Added support for the PyTorch 1.8.1 autograd profiler (#6618) 
- Added - outputsparameter to callback’s- on_validation_epoch_end&- on_test_epoch_endhooks (#6120)
- Added - configure_sharded_modelhook (#6679)
- Added support for - precision=64, enabling training with double precision (#6595)
- Added support for DDP communication hooks (#6736) 
- Added - artifact_locationargument to- MLFlowLoggerwhich will be passed to the- MlflowClient.create_experimentcall (#6677)
- Added - modelparameter to precision plugins’- clip_gradientssignature ( #6764, #7231)
- Added - is_last_batchattribute to- Trainer(#6825)
- Added - LightningModule.lr_schedulers()for manual optimization (#6567)
- Added - MpModelWrapperin TPU Spawn (#7045)
- Added - max_timeTrainer argument to limit training time (#6823)
- Added - on_predict_{batch,epoch}_{start,end}hooks (#7141)
- Added new - EarlyStoppingparameters- stopping_thresholdand- divergence_threshold(#6868)
- Added - debugflag to TPU Training Plugins (PT_XLA_DEBUG) (#7219)
- Added new - UnrepeatedDistributedSamplerand- IndexBatchSamplerWrapperfor tracking distributed predictions (#7215)
- Added - trainer.predict(return_predictions=None|False|True)(#7215)
- Added - BasePredictionWritercallback to implement prediction saving (#7127)
- Added - trainer.tune(scale_batch_size_kwargs, lr_find_kwargs)arguments to configure the tuning algorithms (#7258)
- Added - tpu_distributedcheck for TPU Spawn barrier (#7241)
- Added device updates to TPU Spawn for Pod training (#7243) 
- Added warning when missing - Callbackand using- resume_from_checkpoint(#7254)
- DeepSpeed single file saving (#6900) 
- Added Training type Plugins Registry ( #6982, #7063, #7214, #7224 ) 
- Add - ignoreparam to- save_hyperparameters(#6056)
[1.3.0] - Changed¶
- Changed - LightningModule.truncated_bptt_stepsto be property (#7323)
- Changed - EarlyStoppingcallback from by default running- EarlyStopping.on_validation_endif only training is run. Set- check_on_train_epoch_endto run the callback at the end of the train epoch instead of at the end of the validation epoch (#7069)
- Renamed - pytorch_lightning.callbacks.swato- pytorch_lightning.callbacks.stochastic_weight_avg(#6259)
- Refactor - RunningStageand- TrainerStateusage ( #4945, #7173)- Added - RunningStage.SANITY_CHECKING
- Added - TrainerFn.{FITTING,VALIDATING,TESTING,PREDICTING,TUNING}
- Changed - trainer.evaluatingto return- Trueif validating or testing
 
- Changed - setup()and- teardown()stage argument to take any of- {fit,validate,test,predict}(#6386)
- Changed profilers to save separate report files per state and rank (#6621) 
- The trainer no longer tries to save a checkpoint on exception or run callback’s - on_train_endfunctions (#6864)
- Changed - PyTorchProfilerto use- torch.autograd.profiler.record_functionto record functions (#6349)
- Disabled - lr_scheduler.step()in manual optimization (#6825)
- Changed warnings and recommendations for dataloaders in - ddp_spawn(#6762)
- pl.seed_everythingwill now also set the seed on the- DistributedSampler(#7024)
- Changed default setting for communication of multi-node training using - DDPShardedPlugin(#6937)
- trainer.tune()now returns the tuning result (#7258)
- LightningModule.from_datasets()now accepts- IterableDatasetinstances as training datasets. (#7503)
- Changed - resume_from_checkpointwarning to an error when the checkpoint file does not exist (#7075)
- Automatically set - sync_batchnormfor- training_type_plugin(#6536)
- Allowed training type plugin to delay optimizer creation (#6331) 
- Removed ModelSummary validation from train loop on_trainer_init (#6610) 
- Moved - save_functionto accelerator (#6689)
- Improved verbose logging for - EarlyStoppingcallback (#6811)
- Run ddp_spawn dataloader checks on Windows (#6930) 
- Updated mlflow with using - resolve_tags(#6746)
- Moved - save_hyperparametersto its own function (#7119)
- Replaced - _DataModuleWrapperwith- __new__(#7289)
- Reset - current_fxproperties on lightning module in teardown (#7247)
- Auto-set - DataLoader.worker_init_fnwith- seed_everything(#6960)
- Remove - model.trainercall inside of dataloading mixin (#7317)
- Split profilers module (#6261) 
- Ensure accelerator is valid if running interactively (#5970) 
- Disabled batch transfer in DP mode (#6098) 
[1.3.0] - Deprecated¶
- Deprecated - outputsin both- LightningModule.on_train_epoch_endand- Callback.on_train_epoch_endhooks (#7339)
- Deprecated - Trainer.truncated_bptt_stepsin favor of- LightningModule.truncated_bptt_steps(#7323)
- Deprecated - outputsin both- LightningModule.on_train_epoch_endand- Callback.on_train_epoch_endhooks (#7339)
- Deprecated - LightningModule.grad_normin favor of- pytorch_lightning.utilities.grads.grad_norm(#7292)
- Deprecated the - save_functionproperty from the- ModelCheckpointcallback (#7201)
- Deprecated - LightningModule.write_predictionsand- LightningModule.write_predictions_dict(#7066)
- Deprecated - TrainerLoggingMixinin favor of a separate utilities module for metric handling (#7180)
- Deprecated - TrainerTrainingTricksMixinin favor of a separate utilities module for NaN/Inf detection for gradients and parameters (#6834)
- periodhas been deprecated in favor of- every_n_val_epochsin the- ModelCheckpointcallback (#6146)
- Deprecated - trainer.running_sanity_checkin favor of- trainer.sanity_checking(#4945)
- Deprecated - Profiler(output_filename)in favor of- dirpathand- filename(#6621)
- Deprecated - PytorchProfiler(profiled_functions)in favor of- record_functions(#6349)
- Deprecated - @auto_move_datain favor of- trainer.predict(#6993)
- Deprecated - Callback.on_load_checkpoint(checkpoint)in favor of- Callback.on_load_checkpoint(trainer, pl_module, checkpoint)(#7253)
- Deprecated metrics in favor of - torchmetrics( #6505, #6530, #6540, #6547, #6515, #6572, #6573, #6584, #6636, #6637, #6649, #6659, #7131, )
- Deprecated the - LightningModule.datamodulegetter and setter methods; access them through- Trainer.datamoduleinstead (#7168)
- Deprecated the use of - Trainer(gpus="i")(string) for selecting the i-th GPU; from v1.5 this will set the number of GPUs instead of the index (#6388)
[1.3.0] - Removed¶
- Removed the - exp_save_pathproperty from the- LightningModule(#7266)
- Removed training loop explicitly calling - EarlyStopping.on_validation_endif no validation is run (#7069)
- Removed - automatic_optimizationas a property from the training loop in favor of- LightningModule.automatic_optimization(#7130)
- Removed evaluation loop legacy returns for - *_epoch_endhooks (#6973)
- Removed support for passing a bool value to - profilerargument of Trainer (#6164)
- Removed no return warning from val/test step (#6139) 
- Removed passing a - ModelCheckpointinstance to- Trainer(checkpoint_callback)(#6166)
- Removed deprecated Trainer argument - enable_pl_optimizerand- automatic_optimization(#6163)
- Removed deprecated metrics (#6161) - from - pytorch_lightning.metrics.functional.classificationremoved- to_onehot,- to_categorical,- get_num_classes,- roc,- multiclass_roc,- average_precision,- precision_recall_curve,- multiclass_precision_recall_curve
- from - pytorch_lightning.metrics.functional.reductionremoved- reduce,- class_reduce
 
- Removed deprecated - ModelCheckpointarguments- prefix,- mode="auto"(#6162)
- Removed - mode='auto'from- EarlyStopping(#6167)
- Removed - epochand- steparguments from- ModelCheckpoint.format_checkpoint_name(), these are now included in the- metricsargument (#7344)
- Removed legacy references for magic keys in the - Resultobject (#6016)
- Removed deprecated - LightningModule- hparamssetter (#6207)
- Removed legacy code to log or include metrics in the progress bar by returning them in a dict with the - "log"/"progress_bar"magic keys. Use- self.loginstead (#6734)
- Removed - trainer.fit()return value of- 1. It has no return now (#7237)
- Removed - logger_connectorlegacy code (#6733)
- Removed unused mixin attributes (#6487) 
[1.3.0] - Fixed¶
- Fixed NaN errors in progress bars when training with iterable datasets with no length defined (#7306) 
- Fixed attaching train and validation dataloaders when - reload_dataloaders_every_epoch=Trueand- num_sanity_val_steps=0(#7207)
- Added a barrier in the accelerator - teardownto synchronize processes before execution finishes (#6814)
- Fixed multi-node DDP sub-process launch by using - local_rankinstead of- global_rankfor main process assertion (#7061)
- Fixed incorrect removal of - WORLD_SIZEenvironment variable in DDP training when launching with torch distributed/torchelastic (#6942)
- Made the - Plugin.reducemethod more consistent across all Plugins to reflect a mean-reduction by default (#6011)
- Move lightning module to correct device type when using LightningDistributedWrapper (#6070) 
- Do not print top-k verbose log with - ModelCheckpoint(monitor=None)(#6109)
- Fixed - ModelCheckpoint(save_top_k=0, save_last=True)not saving the- lastcheckpoint (#6136)
- Fixed - .teardown(stage='fit')and- .on_fit_{start,end}()getting called during- trainer.test(#6386)
- Fixed LightningModule - all_gatheron cpu tensors (#6416)
- Fixed torch distributed not available in setup hook for DDP (#6506) 
- Fixed - trainer.tuner.{lr_find,scale_batch_size}not setting the- Trainerstate properly (#7258)
- Fixed bug where the learning rate schedulers did not follow the optimizer frequencies (#4868) 
- Fixed pickle error checker to now check for - pickle.PickleErrorto catch all pickle errors (#6917)
- Fixed a bug where the outputs object passed to - LightningModule.training_epoch_endwas different from the object passed to the- on_train_end_epochhook (#6969)
- Fixed a bug where the outputs passed to - train_batch_endwould be lists even when using a single optimizer and no truncated backprop through time steps (#6969)
- Fixed bug for trainer error handling which would cause hang for distributed training (#6864) 
- Fixed - self.devicenot returning the correct device in replicas of data-parallel (#6414)
- Fixed - lr_findtrying beyond- num_trainingsteps and suggesting a too high learning rate (#7076)
- Fixed logger creating incorrect version folder in DDP with repeated - Trainer.fitcalls (#7077)
- Fixed metric objects passed directly to - self.lognot being reset correctly (#7055)
- Fixed - CombinedLoaderin distributed settings for validation / testing (#7102)
- Fixed the save_dir in - WandbLoggerwhen the run was initiated externally (#7106)
- Fixed - num_sanity_val_stepsaffecting reproducibility of training data shuffling (#7014)
- Fixed resetting device after - fitting/evaluating/predicting(#7188)
- Fixed bug where - trainer.tuner.scale_batch_size(max_trials=0)would not return the correct batch size result (#7262)
- Fixed metrics not being properly logged with - precision=16and- manual_optimization(#7228)
- Fixed - BaseFinetuningproperly reloading- optimizer_stateswhen using- resume_from_checkpoint(#6891)
- Fixed - parameters_to_ignorenot properly set to DDPWrapper (#7239)
- Fixed parsing of - fast_dev_run=Truewith the built-in- ArgumentParser(#7240)
- Fixed handling an - IterableDatasetthat fails to produce a batch at the beginning of an epoch (#7294)
- Fixed - LightningModule.save_hyperparameters()when attempting to save an empty container (#7268)
- Fixed - apexnot properly instantiated when running with- ddp(#7274)
- Fixed optimizer - statenot moved to- GPU(#7277)
- Fixed custom init args for - WandbLogger(#6989)
- Fixed a bug where an error would be raised if the train dataloader sometimes produced None for a batch (#7342) 
- Fixed examples ( #6600, #6638, #7096, #7246, #6357, #6476, #6294, #6373, #6088, #7398 ) 
- Resolved schedule step bug for PyTorch Profiler (#6674, #6681) 
- Updated logic for checking TPUs availability (#6767) 
- Resolve TPU miss rendezvous (#6781) 
- Fixed auto-scaling mode when calling tune method on trainer (#7321) 
- Fixed finetuning complex models correctly unfreezes (#6880) 
- Ensure we set the eval/train flag correctly on accelerator model (#6877) 
- Set better defaults for - rank_zero_only.rankwhen training is launched with SLURM and torchelastic (#6802)
- Fixed matching the number of outputs of backward with forward for AllGatherGrad (#6625) 
- Fixed the - gradient_clip_algorithmhas no effect (#6928)
- Fixed CUDA OOM detection and handling (#6934) 
- Fixed - unfreeze_and_add_param_groupexpects- modulesrather than- module(#6822)
- Fixed DPP + SyncBN when move on device (#6838) 
- Fixed missing arguments in - lr_findcall (#6784)
- Fixed - set_default_tensor_typeto- torch.DoubleTensorwith precision=64 (#7108)
- Fixed - NeptuneLogger.log_text(step=None)(#7194)
[1.2.9] - 2021-04-20¶
[1.2.9] - Fixed¶
[1.2.8] - 2021-04-14¶
[1.2.8] - Added¶
- Added TPUSpawn + IterableDataset error message (#6875) 
[1.2.8] - Fixed¶
- Fixed process rank not being available right away after - Trainerinstantiation (#6941)
- Fixed - sync_distfor tpus (#6950)
- Fixed - AttributeErrorfor- require_backward_grad_syncwhen running manual optimization with sharded plugin (#6915)
- Fixed - --gpusdefault for parser returned by- Trainer.add_argparse_args(#6898)
- Fixed TPU Spawn all gather (#6896) 
- Fixed - EarlyStoppinglogic when- min_epochsor- min_stepsrequirement is not met (#6705)
- Fixed csv extension check (#6436) 
- Fixed checkpoint issue when using Horovod distributed backend (#6958) 
- Fixed tensorboard exception raising (#6901) 
- Fixed setting the eval/train flag correctly on accelerator model (#6983) 
- Fixed DDP_SPAWN compatibility with bug_report_model.py (#6892) 
- Fixed bug where - BaseFinetuning.flatten_modules()was duplicating leaf node parameters (#6879)
- Set better defaults for - rank_zero_only.rankwhen training is launched with SLURM and torchelastic:
[1.2.7] - 2021-04-06¶
[1.2.7] - Fixed¶
- Fixed resolve a bug with omegaconf and xm.save (#6741) 
- Fixed an issue with IterableDataset when len is not defined (#6828) 
- Sanitize None params during pruning (#6836) 
- Enforce an epoch scheduler interval when using SWA (#6588) 
- Fixed TPU Colab hang issue, post training (#6816) 
- Fixed a bug where - TensorBoardLoggerwould give a warning and not log correctly to a symbolic link- save_dir(#6730)
- Fixed bug where - predictcould not be used when- progress_bar_refresh_rate=0(#6884)
[1.2.6] - 2021-03-30¶
[1.2.6] - Changed¶
- Changed the behavior of - on_epoch_startto run at the beginning of validation & test epoch (#6498)
[1.2.6] - Removed¶
- Removed legacy code to include - stepdictionary returns in- callback_metrics. Use- self.log_dictinstead. (#6682)
[1.2.6] - Fixed¶
- Fixed - DummyLogger.log_hyperparamsraising a- TypeErrorwhen running with- fast_dev_run=True(#6398)
- Fixed error on TPUs when there was no - ModelCheckpoint(#6654)
- Fixed - trainer.testfreeze on TPUs (#6654)
- Fixed a bug where gradients were disabled after calling - Trainer.predict(#6657)
- Fixed bug where no TPUs were detected in a TPU pod env (#6719) 
[1.2.5] - 2021-03-23¶
[1.2.5] - Changed¶
[1.2.5] - Fixed¶
[1.2.4] - 2021-03-16¶
[1.2.4] - Changed¶
- Changed the default of - find_unused_parametersback to- Truein DDP and DDP Spawn (#6438)
[1.2.4] - Fixed¶
- Expose DeepSpeed loss parameters to allow users to fix loss instability (#6115) 
- Fixed DP reduction with collection (#6324) 
- Fixed an issue where the tuner would not tune the learning rate if also tuning the batch size (#4688) 
- Fixed broadcast to use PyTorch - broadcast_object_listand add- reduce_decision(#6410)
- Fixed logger creating directory structure too early in DDP (#6380) 
- Fixed DeepSpeed additional memory use on rank 0 when default device not set early enough (#6460) 
- Fixed an issue with - Tuner.scale_batch_sizenot finding the batch size attribute in the datamodule (#5968)
- Fixed an exception in the layer summary when the model contains torch.jit scripted submodules (#6511) 
- Fixed when Train loop config was run during - Trainer.predict(#6541)
[1.2.3] - 2021-03-09¶
[1.2.3] - Fixed¶
- Fixed - ModelPruning(make_pruning_permanent=True)pruning buffers getting removed when saved during training (#6073)
- Fixed when - _stable_1d_sortto work when- n >= N(#6177)
- Fixed - AttributeErrorwhen- logger=Noneon TPU (#6221)
- Fixed PyTorch Profiler with - emit_nvtx(#6260)
- Fixed - trainer.testfrom- best_pathhangs after calling- trainer.fit(#6272)
- Fixed - SingleTPUcalling- all_gather(#6296)
- Ensure we check DeepSpeed/Sharded in multi-node DDP (#6297 
- Check - LightningOptimizerdoesn’t delete optimizer hooks (#6305
- Resolve memory leak for evaluation (#6326 
- Ensure that clip gradients is only called if the value is greater than 0 (#6330 
- Fixed - Trainernot resetting- lightning_optimizerswhen calling- Trainer.fit()multiple times (#6372)
[1.2.2] - 2021-03-02¶
[1.2.2] - Added¶
- Added - checkpointparameter to callback’s- on_save_checkpointhook (#6072)
[1.2.2] - Changed¶
[1.2.2] - Fixed¶
- Fixed epoch level schedulers not being called when - val_check_interval < 1.0(#6075)
- Fixed multiple early stopping callbacks (#6197) 
- Fixed incorrect usage of - detach(),- cpu(),- to()(#6216)
- Fixed LBFGS optimizer support which didn’t converge in automatic optimization (#6147) 
- Prevent - WandbLoggerfrom dropping values (#5931)
- Fixed error thrown when using valid distributed mode in multi node (#6297 
[1.2.1] - 2021-02-23¶
[1.2.1] - Fixed¶
[1.2.0] - 2021-02-18¶
[1.2.0] - Added¶
- Added - DataType,- AverageMethodand- MDMCAverageMethodenum in metrics (#5657)
- Added support for summarized model total params size in megabytes (#5590) 
- Added support for multiple train loaders (#1959) 
- Added - Accuracymetric now generalizes to Top-k accuracy for (multi-dimensional) multi-class inputs using the- top_kparameter (#4838)
- Added - Accuracymetric now enables the computation of subset accuracy for multi-label or multi-dimensional multi-class inputs with the- subset_accuracyparameter (#4838)
- Added - HammingDistancemetric to compute the hamming distance (loss) (#4838)
- Added - max_fprparameter to- aurocmetric for computing partial auroc metric (#3790)
- Added - StatScoresmetric to compute the number of true positives, false positives, true negatives and false negatives (#4839)
- Added - R2Scoremetric (#5241)
- Added - LambdaCallback(#5347)
- Added - BackboneLambdaFinetuningCallback(#5377)
- Accelerator - all_gathersupports collection (#5221)
- Added - image_gradientsfunctional metric to compute the image gradients of a given input image. (#5056)
- Added - MetricCollection(#4318)
- Added - .clone()method to metrics (#4318)
- Added - IoUclass interface (#4704)
- Support to tie weights after moving model to TPU via - on_post_move_to_devicehook
- Added missing val/test hooks in - LightningModule(#5467)
- The - Recalland- Precisionmetrics (and their functional counterparts- recalland- precision) can now be generalized to Recall@K and Precision@K with the use of- top_kparameter (#4842)
- Added - PyTorchProfiler(#5560)
- Added compositional metrics (#5464) 
- Added Trainer method - predict(...)for high performence predictions (#5579)
- Added - on_before_batch_transferand- on_after_batch_transferdata hooks (#3671)
- Added AUC/AUROC class interface (#5479) 
- Added - PredictLoopobject (#5752)
- Added - LightningModule.configure_callbacksto enable the definition of model-specific callbacks (#5621)
- Added - dimto- PSNRmetric for mean-squared-error reduction (#5957)
- Added promxial policy optimization template to pl_examples (#5394) 
- Added - log_graphto- CometLogger(#5295)
- Added possibility for nested loaders (#5404) 
- Added - sync_stepto Wandb logger (#5351)
- Added - StochasticWeightAveragingcallback (#5640)
- Added - LightningDataModule.from_datasets(...)(#5133)
- Added - PL_TORCH_DISTRIBUTED_BACKENDenv variable to select backend (#5981)
- Added - Trainerflag to activate Stochastic Weight Averaging (SWA)- Trainer(stochastic_weight_avg=True)(#6038)
[1.2.0] - Changed¶
- Changed - stat_scoresmetric now calculates stat scores over all classes and gains new parameters, in line with the new- StatScoresmetric (#4839)
- Changed - computer_vision_fine_tunningexample to use- BackboneLambdaFinetuningCallback(#5377)
- Changed - automatic castingfor LoggerConnector- metrics(#5218)
- Changed - iou[func] to allow float input (#4704)
- Metric - compute()method will no longer automatically call- reset()(#5409)
- Set PyTorch 1.4 as min requirements, also for testing and examples - torchvision>=0.5and- torchtext>=0.5(#5418)
- Changed - callbacksargument in- Trainerto allow- Callbackinput (#5446)
- Changed the default of - find_unused_parametersto- Falsein DDP (#5185)
- Changed - ModelCheckpointversion suffixes to start at 1 (#5008)
- Progress bar metrics tensors are now converted to float (#5692) 
- Changed the default value for the - progress_bar_refresh_rateTrainer argument in Google COLAB notebooks to 20 (#5516)
- Extended support for purely iteration-based training (#5726) 
- Made - LightningModule.global_rank,- LightningModule.local_rankand- LightningModule.loggerread-only properties (#5730)
- Forced - ModelCheckpointcallbacks to run after all others to guarantee all states are saved to the checkpoint (#5731)
- Refactored Accelerators and Plugins: - Added base classes for plugins (#5715) 
- Added parallel plugins for DP, DDP, DDPSpawn, DDP2 and Horovod (#5714) 
- Precision Plugins (#5718) 
- Added new Accelerators for CPU, GPU and TPU (#5719) 
- Added RPC and Sharded plugins (#5732) 
- Added missing - LightningModule-wrapper logic to new plugins and accelerator (#5734)
- Moved device-specific teardown logic from training loop to accelerator (#5973) 
- Moved accelerator_connector.py to the connectors subfolder (#6033) 
- Trainer only references accelerator (#6039) 
- Made parallel devices optional across all plugins (#6051) 
 
- Enabled - self.login callbacks (#5094)
- Renamed xxx_AVAILABLE as protected (#5082) 
- Unified module names in Utils (#5199) 
- Refactor: clean trainer device & distributed getters (#5300) 
- Simplified training phase as LightningEnum (#5419) 
- Updated metrics to use LightningEnum (#5689) 
- Changed the seq of - on_train_batch_end,- on_batch_end&- on_train_epoch_end,- on_epoch_end hooks(#5688)
- Refactored - setup_trainingand remove- test_mode(#5388)
- Disabled training with zero - num_training_batcheswhen insufficient- limit_train_batches(#5703)
- Refactored - EpochResultStore(#5522)
- Update - lr_finderto check for attribute if not running- fast_dev_run(#5990)
- LightningOptimizer manual optimizer is more flexible and expose - toggle_model(#5771)
- MlflowLoggerlimit parameter value length to 250 char (#5893)
- Re-introduced fix for Hydra directory sync with multiple process (#5993) 
[1.2.0] - Deprecated¶
- Function - stat_scores_multiple_classesis deprecated in favor of- stat_scores(#4839)
- Moved accelerators and plugins to its - legacypkg (#5645)
- Deprecated - LightningDistributedDataParallelin favor of new wrapper module- LightningDistributedModule(#5185)
- Deprecated - LightningDataParallelin favor of new wrapper module- LightningParallelModule(#5670)
- Renamed utils modules (#5199) - argparse_utils>>- argparse
- model_utils>>- model_helpers
- warning_utils>>- warnings
- xla_device_utils>>- xla_device
 
- Deprecated using - 'val_loss'to set the- ModelCheckpointmonitor (#6012)
- Deprecated - .get_model()with explicit- .lightning_moduleproperty (#6035)
- Deprecated Trainer attribute - accelerator_backendin favor of- accelerator(#6034)
[1.2.0] - Removed¶
[1.2.0] - Fixed¶
- Fixed distributed setting and - ddp_cpuonly with- num_processes>1(#5297)
- Fixed - num_workersfor Windows example (#5375)
- Fixed loading yaml (#5619) 
- Fixed support custom DataLoader with DDP if they can be re-instantiated (#5745) 
- Fixed repeated - .fit()calls ignore max_steps iteration bound (#5936)
- Fixed throwing - MisconfigurationErroron unknown mode (#5255)
- Resolve bug with Finetuning (#5744) 
- Fixed - ModelCheckpointrace condition in file existence check (#5155)
- Fixed some compatibility with PyTorch 1.8 (#5864) 
- Fixed forward cache (#5895) 
- Fixed recursive detach of tensors to CPU (#6007) 
- Fixed passing wrong strings for scheduler interval doesn’t throw an error (#5923) 
- Fixed wrong - requires_gradstate after- return Nonewith multiple optimizers (#5738)
- Fixed add - on_epoch_endhook at the end of- validation,- testepoch (#5986)
- Fixed missing - process_dataloadercall for- TPUSpawnwhen in distributed mode (#6015)
- Fixed progress bar flickering by appending 0 to floats/strings (#6009) 
- Fixed synchronization issues with TPU training (#6027) 
- Fixed - hparams.yamlsaved twice when using- TensorBoardLogger(#5953)
- Fixed - fairscalecompatible with PT 1.8 (#5996)
- Ensured - process_dataloaderis called when- tpu_cores > 1to use Parallel DataLoader (#6015)
- Attempted SLURM auto resume call when non-shell call fails (#6002) 
- Fixed wrapping optimizers upon assignment (#6006) 
- Fixed allowing hashing of metrics with lists in their state (#5939) 
[1.1.8] - 2021-02-08¶
[1.1.8] - Fixed¶
[1.1.7] - 2021-02-03¶
[1.1.7] - Fixed¶
- Fixed - TensorBoardLoggernot closing- SummaryWriteron- finalize(#5696)
- Fixed filtering of pytorch “unsqueeze” warning when using DP (#5622) 
- Fixed - num_classesargument in F1 metric (#5663)
- Fixed - log_dirproperty (#5537)
- Fixed a race condition in - ModelCheckpointwhen checking if a checkpoint file exists (#5144)
- Remove unnecessary intermediate layers in Dockerfiles (#5697) 
- Fixed auto learning rate ordering (#5638) 
[1.1.6] - 2021-01-26¶
[1.1.6] - Changed¶
[1.1.6] - Fixed¶
- Fixed - toggle_optimizerto reset- requires_gradstate (#5574)
- Fixed FileNotFoundError for best checkpoint when using DDP with Hydra (#5629) 
- Fixed an error when logging a progress bar metric with a reserved name (#5620) 
- Fixed - Metric’s- state_dictnot included when child modules (#5614)
- Fixed Neptune logger creating multiple experiments when GPUs > 1 (#3256) 
- Fixed duplicate logs appearing in console when using the python logging module (#5509) 
- Fixed tensor printing in - trainer.test()(#5138)
- Fixed not using dataloader when - hparamspresent (#4559)
[1.1.5] - 2021-01-19¶
[1.1.5] - Fixed¶
[1.1.4] - 2021-01-12¶
[1.1.4] - Added¶
- Add automatic optimization property setter to lightning module (#5169) 
[1.1.4] - Changed¶
- Changed deprecated - enable_pl_optimizer=True(#5244)
[1.1.4] - Fixed¶
- Fixed - transfer_batch_to_devicefor DDP with- len(devices_ids) == 1(#5195)
- Logging only on - not should_accumulate()during training (#5417)
- Resolve interpolation bug with Hydra (#5406) 
- Check environ before selecting a seed to prevent warning message (#4743) 
- Fixed signature mismatch in - model_to_deviceof- DDPCPUHPCAccelerator(#5505)
[1.1.3] - 2021-01-05¶
[1.1.3] - Added¶
[1.1.3] - Changed¶
[1.1.3] - Fixed¶
- Fixed - trainer.testreturning non-test metrics (#5214)
- Fixed metric state reset (#5273) 
- Fixed - --num-nodeson- DDPSequentialPlugin(#5327)
- Fixed invalid value for - weights_summary(#5296)
- Fixed - Trainer.testnot using the latest- best_model_path(#5161)
- Fixed existence check for hparams not using underlying filesystem (#5250) 
- Fixed - LightningOptimizerAMP bug (#5191)
- Fixed casted key to string in - _flatten_dict(#5354)
[1.1.2] - 2020-12-23¶
[1.1.2] - Added¶
[1.1.2] - Removed¶
- enable_pl_optimizer=Falseby default to temporarily fix AMP issues (#5163)
[1.1.2] - Fixed¶
- Metric reduction with Logging (#5150) 
- Remove nan loss in manual optimization (#5121) 
- Un-balanced logging properly supported (#5119) 
- Fix hanging in DDP HPC accelerators (#5157) 
- Fix reset - TensorRunningAccum(#5106)
- Updated - DALIClassificationLoaderto not use deprecated arguments (#4925)
- Corrected call to - torch.no_grad(#5124)
[1.1.1] - 2020-12-15¶
[1.1.1] - Added¶
- Add a notebook example to reach a quick baseline of ~94% accuracy on CIFAR10 using Resnet in Lightning (#4818) 
[1.1.1] - Changed¶
[1.1.1] - Removed¶
[1.1.1] - Fixed¶
- Fixed trainer by default - Nonein- DDPAccelerator(#4915)
- Fixed - LightningOptimizerto expose optimizer attributes (#5095)
- Do not warn when the - namekey is used in the- lr_schedulerdict (#5057)
- Check if optimizer supports closure (#4981) 
- Add deprecated metric utility functions back to functional ( #5067, #5068) 
- Allow any input in - to_onnxand- to_torchscript(#4378)
- Fixed - DDPHPCAcceleratorhangs in DDP construction by calling- init_device(#5157)
[1.1.0] - 2020-12-09¶
[1.1.0] - Added¶
- Added “monitor” key to saved - ModelCheckpoints(#4383)
- Added - ConfusionMatrixclass interface (#4348)
- Added multiclass AUROC metric (#4236) 
- Added global step indexing to the checkpoint name for a better sub-epoch checkpointing experience (#3807) 
- Added optimizer hooks in callbacks (#4379) 
- Added option to log momentum (#4384) 
- Added - current_scoreto- ModelCheckpoint.on_save_checkpoint(#4721)
- Added logging using - self.login train and evaluation for epoch end hooks ( #4552, #4495, #4439, #4684, #4913)
- Added ability for DDP plugin to modify optimizer state saving (#4675) 
- Added - prefixargument in loggers (#4557)
- Added printing of total num of params, trainable and non-trainable params in ModelSummary (#4521) 
- Added - PrecisionRecallCurve, ROC, AveragePrecisionclass metric (#4549)
- Added custom - Apexand- NativeAMPas- Precision plugins(#4355)
- Added - DALI MNISTexample (#3721)
- Added - sharded pluginfor DDP for multi-gpu training memory optimizations ( #4639, #4686, #4737, #4773)
- Added - experiment_idto the NeptuneLogger (#3462)
- Added - Pytorch Geometricintegration example with Lightning (#4568)
- Added - all_gathermethod to- LightningModulewhich allows gradient based tensor synchronizations for use-cases such as negative sampling. (#5012)
- Enabled - self.login most functions (#4969)
- Added changeable extension variable for - ModelCheckpoint(#4977)
[1.1.0] - Changed¶
- Tuner algorithms will be skipped if - fast_dev_run=True(#3903)
- WandbLoggerdoes not force wandb- reinitarg to True anymore and creates a run only when needed (#4648)
- Changed - automatic_optimizationto be a model attribute (#4602)
- Changed - Simple Profilerreport to order by percentage time spent + num calls (#4880)
- Simplify optimization Logic (#4984) 
- Classification metrics overhaul (#4837) 
- Updated - fast_dev_runto accept integer representing num_batches (#4629)
- Refactored optimizer (#4658) 
[1.1.0] - Deprecated¶
[1.1.0] - Removed¶
[1.1.0] - Fixed¶
- Added feature to move tensors to CPU before saving (#4309) 
- Fixed - LoggerConnectorto have logged metrics on root device in DP (#4138)
- Auto convert tensors to contiguous format when - gather_all(#4907)
- Fixed - PYTHONPATHfor ddp test model (#4528)
- Fixed allowing logger to support indexing (#4595) 
- Fixed DDP and manual_optimization (#4976) 
[1.0.8] - 2020-11-24¶
[1.0.8] - Added¶
[1.0.8] - Changed¶
- Consistently use - step=trainer.global_stepin- LearningRateMonitorindependently of- logging_interval(#4376)
- Metric states are no longer as default added to - state_dict(#4685)
- Renamed class metric - Fbeta>>- FBeta(#4656)
- Model summary: add 1 decimal place (#4745) 
- Do not override - PYTHONWARNINGS(#4700)
- Changed - init_ddp_connectionmoved from- DDPto- DDPPlugin(#4407)
[1.0.8] - Fixed¶
- Fixed checkpoint - hparamsdict casting when- omegaconfis available (#4770)
- Fixed incomplete progress bars when total batches not divisible by refresh rate (#4577) 
- Updated SSIM metric (#4566) 
- Fixed batch_arg_name - add - batch_arg_nameto all calls to- _adjust_batch_sizebug (#4812)
- Fixed - torchtextdata to GPU (#4785)
- Fixed a crash bug in MLFlow logger (#4716) 
[1.0.7] - 2020-11-17¶
[1.0.7] - Added¶
- Added lambda closure to - manual_optimizer_step(#4618)
[1.0.7] - Changed¶
[1.0.7] - Fixed¶
- Prevent crash if - sync_dist=Trueon CPU (#4626)
- Fixed average pbar Metrics (#4534) 
- Fixed - setupcallback hook to correctly pass the LightningModule through (#4608)
- Allowing decorate model init with saving - hparamsinside (#4662)
- Fixed - split_idxset by- LoggerConnectorin- on_trainer_initto- Trainer(#4697)
[1.0.6] - 2020-11-11¶
[1.0.6] - Added¶
- Added metrics aggregation in Horovod and fixed early stopping (#3775) 
- Added - manual_optimizer_stepwhich work with- AMP Nativeand- accumulated_grad_batches(#4485)
- Added - persistent(mode)method to metrics, to enable and disable metric states being added to- state_dict(#4482)
- Added congratulations at the end of our notebooks (#4555) 
- Added parameters - move_metrics_to_cpuin Trainer to disable gpu leak (#4592)
[1.0.6] - Changed¶
[1.0.6] - Fixed¶
- Fixed feature-lack in - hpc_load(#4526)
- Fixed metrics states being overridden in DDP mode (#4482) 
- Fixed - lightning_getattr,- lightning_hasattrnot finding the correct attributes in datamodule (#4347)
- Fixed automatic optimization AMP by - manual_optimization_step(#4485)
- Replace - MisconfigurationExceptionwith warning in- ModelCheckpointCallback (#4560)
- Fixed logged keys in mlflow logger (#4412) 
- Fixed - is_picklableby catching- AttributeError(#4508)
- Fixed multi test dataloaders dict - AttributeErrorerror (#4480)
- Fixed show progress bar only for - progress_rank 0on- DDP_SLURM(#4437)
[1.0.5] - 2020-11-03¶
[1.0.5] - Added¶
[1.0.5] - Changed¶
- W&B log in sync with - Trainerstep (#4405)
- Hook - on_after_backwardis called only when- optimizer_stepis being called (#4439)
- Moved - track_and_norm_gradinto- training loopand called only when- optimizer_stepis being called (#4439)
- Changed type checker with explicit cast of - ref_modelobject (#4457)
- Changed - distributed_backend->- accelerator(#4429)
[1.0.5] - Deprecated¶
- Deprecated passing - ModelCheckpointinstance to- checkpoint_callbackTrainer argument (#4336)
[1.0.5] - Fixed¶
- Disable saving checkpoints if not trained (#4372) 
- Fixed error using - auto_select_gpus=Truewith- gpus=-1(#4209)
- Disabled training when - limit_train_batches=0(#4371)
- Fixed that metrics do not store computational graph for all seen data (#4313) 
- Fixed AMP unscale for - on_after_backward(#4439)
- Fixed TorchScript export when module includes Metrics (#4428) 
- Fixed TorchScript trace method’s data to device and docstring (#4360) 
- Fixed CSV logger warning (#4419) 
- Fixed skip DDP parameter sync (#4301) 
- Fixed - WandbLogger_sanitize_callable function (#4422)
- Fixed - AMP Native- _unscalegradient (#4441)
[1.0.4] - 2020-10-27¶
[1.0.4] - Added¶
- Added - dirpathand- filenameparameter in- ModelCheckpoint(#4213)
- Added plugins docs and DDPPlugin to customize ddp across all accelerators (#4258) 
- Added - strictoption to the scheduler dictionary (#3586)
- Added - fsspecsupport for profilers (#4162)
- Added autogenerated helptext to - Trainer.add_argparse_args(#4344)
- Added support for string values in - Trainer’s- profilerparameter (#3656)
- Added - optimizer_closureto- optimizer.stepwhen supported (#4190)
- Added unification of regression metrics (#4166) 
- Added checkpoint load from Bytes (#4314) 
[1.0.4] - Changed¶
[1.0.4] - Deprecated¶
[1.0.4] - Fixed¶
- Fixed setting device ids in DDP (#4297) 
- Fixed synchronization of best model path in - ddp_accelerator(#4323)
- Fixed - WandbLoggernot uploading checkpoint artifacts at the end of training (#4341)
- Fixed - FBetacomputation (#4183)
- Fixed - accumulation across batcheshas completed- before breaking training loop(#4278)
- Fixed - ModelCheckpointdon’t increase current_epoch and global_step when not training (#4291)
- Fixed - COMET_EXPERIMENT_KEYenvironment variable usage in comet logger (#4230)
[1.0.3] - 2020-10-20¶
[1.0.3] - Added¶
- Added persistent flag to - Metric.add_state(#4195)
[1.0.3] - Changed¶
[1.0.3] - Fixed¶
[1.0.2] - 2020-10-15¶
[1.0.2] - Added¶
- Added trace functionality to the function - to_torchscript(#4142)
[1.0.2] - Changed¶
- Called - on_load_checkpointbefore loading- state_dict(#4057)
[1.0.2] - Removed¶
- Removed duplicate metric vs step log for train loop (#4173) 
[1.0.2] - Fixed¶
[1.0.1] - 2020-10-14¶
[1.0.1] - Added¶
- Added getstate/setstate method for torch.save serialization (#4127) 
[1.0.0] - 2020-10-13¶
[1.0.0] - Added¶
- Added Explained Variance Metric + metric fix (#4013) 
- Added Metric <-> Lightning Module integration tests (#4008) 
- Added parsing OS env vars in - Trainer(#4022)
- Added classification metrics (#4043) 
- Updated explained variance metric (#4024) 
- Enabled plugins (#4041) 
- Enabled custom clusters (#4048) 
- Enabled passing in custom accelerators (#4050) 
- Added - LightningModule.toggle_optimizer(#4058)
- Added - LightningModule.manual_backward(#4063)
- Added - outputargument to- *_epoch_endhooks (#3967)
[1.0.0] - Changed¶
[1.0.0] - Removed¶
- Removed support for EvalResult and TrainResult (#3968) 
- Removed deprecated trainer flags: - overfit_pct,- log_save_interval,- row_log_interval(#3969)
- Removed deprecated early_stop_callback (#3982) 
- Removed deprecated model hooks (#3980) 
- Removed deprecated callbacks (#3979) 
- Removed - trainerargument in- LightningModule.backward#4056)
[1.0.0] - Fixed¶
[0.10.0] - 2020-10-07¶
[0.10.0] - Added¶
- Enable PyTorch 1.7 compatibility (#3541) 
- Added - LightningModule.to_torchscriptto support exporting as- ScriptModule(#3258)
- Added warning when dropping unpicklable - hparams(#2874)
- Added EMB similarity (#3349) 
- Added - ModelCheckpoint.to_yamlmethod (#3048)
- Allow - ModelCheckpointmonitor to be- None, meaning it will always save (#3630)
- Disabled optimizers setup during testing (#3059) 
- Added support for datamodules to save and load checkpoints when training (#3563) 
- Added support for datamodule in learning rate finder (#3425) 
- Added gradient clip test for native AMP (#3754) 
- Added dist lib to enable syncing anything across devices (#3762) 
- Added - broadcastto- TPUBackend(#3814)
- Added - XLADeviceUtilsclass to check XLA device type (#3274)
[0.10.0] - Changed¶
- Refactored accelerator backends: - moved TPU - xxx_stepto backend (#3118)
- refactored DDP backend - forward(#3119)
- refactored GPU backend - __step(#3120)
- remove obscure forward call in eval + CPU backend - ___step(#3123)
- reduced all simplified forward (#3126) 
- added hook base method (#3127) 
- refactor eval loop to use hooks - use - test_modefor if so we can split later (#3129)
- moved - ___step_endhooks (#3130)
- training forward refactor (#3134) 
- training AMP scaling refactor (#3135) 
- eval step scaling factor (#3136) 
- add eval loop object to streamline eval loop (#3138) 
- refactored dataloader process hook (#3139) 
- refactored inner eval loop (#3141) 
- final inner eval loop hooks (#3154) 
- clean up hooks in - run_evaluation(#3156)
- clean up data reset (#3161) 
- expand eval loop out (#3165) 
- moved hooks around in eval loop (#3195) 
- remove - _evaluatefx (#3197)
- Trainer.fithook clean up (#3198)
- DDPs train hooks (#3203) 
- reduced accelerator selection (#3211) 
- group prepare data hook (#3212) 
- added data connector (#3285) 
- modular is_overridden (#3290) 
- adding - Trainer.tune()(#3293)
- move - run_pretrain_routine->- setup_training(#3294)
- move train outside of setup training (#3297) 
- move - prepare_datato data connector (#3307)
- moved accelerator router (#3309) 
- train loop refactor - moving train loop to own object (#3310, #3312, #3313, #3314) 
- duplicate data interface definition up into DataHooks class (#3344) 
- inner train loop (#3359, #3361, #3362, #3363, #3365, #3366, #3367, #3368, #3369, #3370, #3371, #3372, #3373, #3374, #3375, #3376, #3385, #3388, #3397) 
- all logging related calls in a connector (#3395) 
- added model connector (#3407) 
- moved eval loop logging to loggers (#3408) 
- moved eval loop (#3412#3408) 
- move - lr_finder(#3434)
- move specific accelerator code (#3457) 
- group connectors (#3472) 
- apex plugin (#3502) 
- precision plugins (#3504) 
- Result - make monitor default to - checkpoint_onto simplify (#3571)
- reference to the Trainer on the - LightningDataModule(#3684)
- add - .logto lightning module (#3686, #3699, #3701, #3704, #3715)
- enable tracking original metric when step and epoch are both true (#3685) 
- deprecated results obj, added support for simpler comms (#3681) 
- move backends back to individual files (#3712) 
- fixes logging for eval steps (#3763) 
- decoupled DDP, DDP spawn (#3733, #3766, #3767, #3774, #3802, #3806, #3817, #3819, #3927) 
- remove weight loading hack for ddp_cpu (#3808) 
- separate - torchelasticfrom DDP (#3810)
- separate SLURM from DDP (#3809) 
- decoupled DDP2 (#3816) 
- bug fix with logging val epoch end + monitor (#3812) 
- callback system and init DDP (#3836) 
- epoch can now log independently (#3843) 
- test selecting the correct backend. temp backends while slurm and TorchElastic are decoupled (#3848) 
- fixed - init_slurm_connectioncausing hostname errors (#3856)
- moves init apex from LM to apex connector (#3923) 
- moves sync bn to each backend (#3925) 
- moves configure ddp to each backend (#3924) 
 
- Deprecation warning (#3844) 
- Changed - LearningRateLoggerto- LearningRateMonitor(#3251)
- Used - fsspecinstead of- gfilefor all IO (#3320)- Swaped - torch.loadfor- fsspecload in DDP spawn backend (#3787)
- Swaped - torch.loadfor- fsspecload in cloud_io loading (#3692)
- Added support for - to_disk()to use remote filepaths with- fsspec(#3930)
- Updated model_checkpoint’s to_yaml to use - fsspecopen (#3801)
- Fixed - fsspecis inconsistent when doing- fs.ls(#3805)
 
- Refactor - GPUStatsMonitorto improve training speed (#3257)
- Changed IoU score behavior for classes absent in target and pred (#3098) 
- Changed IoU - remove_bgbool to- ignore_indexoptional int (#3098)
- Changed defaults of - save_top_kand- save_lastto- Nonein ModelCheckpoint (#3680)
- row_log_intervaland- log_save_intervalare now based on training loop’s- global_stepinstead of epoch-internal batch index (#3667)
- Silenced some warnings. verified ddp refactors (#3483) 
- Cleaning up stale logger tests (#3490) 
- Allow - ModelCheckpointmonitor to be- None(#3633)
- Enable - Nonemodel checkpoint default (#3669)
- Skipped - best_model_pathif- checkpoint_callbackis- None(#2962)
- Used - raise .. from ..to explicitly chain exceptions (#3750)
- Mocking loggers (#3596, #3617, #3851, #3859, #3884, #3853, #3910, #3889, #3926) 
- Write predictions in LightningModule instead of EvalResult #3882 
[0.10.0] - Deprecated¶
- Deprecated - TrainResultand- EvalResult, use- self.logand- self.writefrom the- LightningModuleto log metrics and write predictions.- training_stepcan now only return a scalar (for the loss) or a dictionary with anything you want. (#3681)
- Deprecate - early_stop_callbackTrainer argument (#3845)
- Rename Trainer arguments - row_log_interval>>- log_every_n_stepsand- log_save_interval>>- flush_logs_every_n_steps(#3748)
[0.10.0] - Removed¶
- Removed experimental Metric API (#3943, #3949, #3946), listed changes before final removal: 
- Added hooks to metric module interface (#2528) 
- Added error when AUROC metric is used for multiclass problems (#3350) 
- Fixed - ModelCheckpointwith- save_top_k=-1option not tracking the best models when a monitor metric is available (#3735)
- Fixed counter-intuitive error being thrown in - Accuracymetric for zero target tensor (#3764)
- Fixed aggregation of metrics (#3517) 
- Fixed Metric aggregation (#3321) 
- Fixed RMSLE metric (#3188) 
- Renamed - reductionto- class_reductionin classification metrics (#3322)
- Changed - class_reductionsimilar to sklearn for classification metrics (#3322)
- Renaming of precision recall metric (#3308) 
 
[0.10.0] - Fixed¶
- Fixed - on_train_batch_starthook to end epoch early (#3700)
- Fixed - num_sanity_val_stepsis clipped to- limit_val_batches(#2917)
- Fixed ONNX model save on GPU (#3145) 
- Fixed - GpuUsageLoggerto work on different platforms (#3008)
- Fixed auto-scale batch size not dumping - auto_lr_findparameter (#3151)
- Fixed - batch_outputswith optimizer frequencies (#3229)
- Fixed setting batch size in - LightningModule.datamodulewhen using- auto_scale_batch_size(#3266)
- Fixed Horovod distributed backend compatibility with native AMP (#3404) 
- Fixed batch size auto scaling exceeding the size of the dataset (#3271) 
- Fixed getting - experiment_idfrom MLFlow only once instead of each training loop (#3394)
- Fixed - overfit_batcheswhich now correctly disables shuffling for the training loader. (#3501)
- Fixed gradient norm tracking for - row_log_interval > 1(#3489)
- Fixed - ModelCheckpointname formatting (#3164)
- Fixed example implementation of AutoEncoder (#3190) 
- Fixed invalid paths when remote logging with TensorBoard (#3236) 
- Fixed change - t()to- transpose()as XLA devices do not support- .t()on 1-dim tensor (#3252)
- Fixed (weights only) checkpoints loading without PL (#3287) 
- Fixed - gather_all_tensorscross GPUs in DDP (#3319)
- Fixed CometML save dir (#3419) 
- Fixed forward key metrics (#3467) 
- Fixed normalize mode at confusion matrix (replace NaNs with zeros) (#3465) 
- Fixed global step increment in training loop when - training_epoch_endhook is used (#3673)
- Fixed dataloader shuffling not getting turned off with - overfit_batches > 0and- distributed_backend = "ddp"(#3534)
- Fixed determinism in - DDPSpawnBackendwhen using- seed_everythingin main process (#3335)
- Fixed - ModelCheckpoint- periodto actually save every- periodepochs (#3630)
- Fixed - val_progress_bartotal with- num_sanity_val_steps(#3751)
- Fixed Tuner dump: add - current_epochto dumped_params (#3261)
- Fixed - current_epochand- global_stepproperties mismatch between- Trainerand- LightningModule(#3785)
- Fixed learning rate scheduler for optimizers with internal state (#3897) 
- Fixed - tbptt_reduce_fxwhen non-floating tensors are logged (#3796)
- Fixed model checkpoint frequency (#3852) 
- Fixed logging non-tensor scalar with result breaks subsequent epoch aggregation (#3855) 
- Fixed - TrainerEvaluationLoopMixinactivates- model.train()at the end (#3858)
- Fixed - overfit_batcheswhen using with multiple val/test_dataloaders (#3857)
- Fixed enables - training_stepto return- None(#3862)
- Fixed init nan for checkpointing (#3863) 
- Fixed for - load_from_checkpoint(#2776)
- Fixes incorrect - batch_sizeswhen Dataloader returns a dict with multiple tensors (#3668)
- Fixed unexpected signature for - validation_step(#3947)
[0.9.0] - 2020-08-20¶
[0.9.0] - Added¶
- Added basic - CSVLogger(#2721)
- Added SSIM metrics (#2671) 
- Added BLEU metrics (#2535) 
- Added support to export a model to ONNX format (#2596) 
- Added support for - Trainer(num_sanity_val_steps=-1)to check all validation data before training (#2246)
- Added struct. output: 
- Added class - LightningDataModule(#2668)
- Added support for PyTorch 1.6 (#2745) 
- Added call DataModule hooks implicitly in trainer (#2755) 
- Added support for Mean in DDP Sync (#2568) 
- Added remaining - sklearnmetrics:- AveragePrecision,- BalancedAccuracy,- CohenKappaScore,- DCG,- Hamming,- Hinge,- Jaccard,- MeanAbsoluteError,- MeanSquaredError,- MeanSquaredLogError,- MedianAbsoluteError,- R2Score,- MeanPoissonDeviance,- MeanGammaDeviance,- MeanTweedieDeviance,- ExplainedVariance(#2562)
- Added support for - limit_{mode}_batches (int)to work with infinite dataloader (IterableDataset) (#2840)
- Added support returning python scalars in DP (#1935) 
- Added support to Tensorboard logger for OmegaConf - hparams(#2846)
- Added tracking of basic states in - Trainer(#2541)
- Tracks all outputs including TBPTT and multiple optimizers (#2890) 
- Added GPU Usage Logger (#2932) 
- Added - strict=Falsefor- load_from_checkpoint(#2819)
- Added saving test predictions on multiple GPUs (#2926) 
- Auto log the computational graph for loggers that support this (#3003) 
- Added warning when changing monitor and using results obj (#3014) 
- Added a hook - transfer_batch_to_deviceto the- LightningDataModule(#3038)
[0.9.0] - Changed¶
- Truncated long version numbers in progress bar (#2594) 
- Enabling val/test loop disabling (#2692) 
- Refactored into - acceleratormodule:
- Using - .comet.configfile for- CometLogger(#1913)
- Updated hooks arguments - breaking for - setupand- teardown(#2850)
- Using - gfileto support remote directories (#2164)
- Moved optimizer creation after device placement for DDP backends (#2904) 
- Support - **DictConfigfor- hparamserialization (#2519)
- Removed callback metrics from test results obj (#2994) 
- Re-enabled naming metrics in ckpt name (#3060) 
- Changed progress bar epoch counting to start from 0 (#3061) 
[0.9.0] - Deprecated¶
- Deprecated Trainer attribute - ckpt_path, which will now be set by- weights_save_path(#2681)
[0.9.0] - Removed¶
- Removed deprecated: (#2760) - core decorator - data_loader
- Module hook - on_sanity_check_startand loading- load_from_metrics
- package - pytorch_lightning.logging
- Trainer arguments: - show_progress_bar,- num_tpu_cores,- use_amp,- print_nan_grads
- LR Finder argument - num_accumulation_steps
 
[0.9.0] - Fixed¶
- Fixed - accumulate_grad_batchesfor last batch (#2853)
- Fixed setup call while testing (#2624) 
- Fixed local rank zero casting (#2640) 
- Fixed single scalar return from training (#2587) 
- Fixed Horovod backend to scale LR schedlers with the optimizer (#2626) 
- Fixed - dtypeand- deviceproperties not getting updated in submodules (#2657)
- Fixed - fast_dev_runto run for all dataloaders (#2581)
- Fixed - save_dirin loggers getting ignored by default value of- weights_save_pathwhen user did not specify- weights_save_path(#2681)
- Fixed - weights_save_pathgetting ignored when- logger=Falseis passed to Trainer (#2681)
- Fixed TPU multi-core and Float16 (#2632) 
- Fixed test metrics not being logged with - LoggerCollection(#2723)
- Fixed data transfer to device when using - torchtext.data.Fieldand- include_lengths is True(#2689)
- Fixed shuffle argument for distributed sampler (#2789) 
- Fixed logging interval (#2694) 
- Fixed loss value in the progress bar is wrong when - accumulate_grad_batches > 1(#2738)
- Fixed correct CWD for ddp sub-processes when using Hydra (#2719) 
- Fixed selecting GPUs using - CUDA_VISIBLE_DEVICES(#2739)
- Fixed false - num_classeswarning in metrics (#2781)
- Fixed shell injection vulnerability in subprocess call (#2786) 
- Fixed LR finder and - hparamscompatibility (#2821)
- Fixed - ModelCheckpointnot saving the latest information when- save_last=True(#2881)
- Fixed ImageNet example: learning rate scheduler, number of workers and batch size when using DDP (#2889) 
- Fixed apex gradient clipping (#2829) 
- Fixed save apex scaler states (#2828) 
- Fixed a model loading issue with inheritance and variable positional arguments (#2911) 
- Fixed passing - non_blocking=Truewhen transferring a batch object that does not support it (#2910)
- Fixed checkpointing to remote file paths (#2925) 
- Fixed adding val step argument to metrics (#2986) 
- Fixed an issue that caused - Trainer.test()to stall in ddp mode (#2997)
- Fixed gathering of results with tensors of varying shape (#3020) 
- Fixed batch size auto-scaling feature to set the new value on the correct model attribute (#3043) 
- Fixed automatic batch scaling not working with half precision (#3045) 
- Fixed setting device to root gpu (#3042) 
[0.8.5] - 2020-07-09¶
[0.8.5] - Added¶
[0.8.5] - Removed¶
- Removed auto val reduce (#2462) 
[0.8.5] - Fixed¶
- Flattening Wandb Hyperparameters (#2459) 
- Fixed using the same DDP python interpreter and actually running (#2482) 
- Fixed model summary input type conversion for models that have input dtype different from model parameters (#2510) 
- Made - TensorBoardLoggerand- CometLoggerpickleable (#2518)
- Fixed a problem with - MLflowLoggercreating multiple run folders (#2502)
- Fixed global_step increment (#2455) 
- Fixed TPU hanging example (#2488) 
- Fixed - argparsedefault value bug (#2526)
- Fixed Dice and IoU to avoid NaN by adding small eps (#2545) 
- Fixed accumulate gradients schedule at epoch 0 (continued) (#2513) 
- Fixed Trainer - .fit()returning last not best weights in “ddp_spawn” (#2565)
- Fixed passing (do not pass) TPU weights back on test (#2566) 
[0.8.4] - 2020-07-01¶
[0.8.4] - Added¶
[0.8.4] - Changed¶
- Enabled no returns from eval (#2446) 
[0.8.4] - Fixed¶
[0.8.3] - 2020-06-29¶
[0.8.3] - Fixed¶
[0.8.2] - 2020-06-28¶
[0.8.2] - Added¶
- Added TorchText support for moving data to GPU (#2379) 
[0.8.2] - Changed¶
[0.8.2] - Removed¶
- Moved - TrainsLoggerto Bolts (#2384)
[0.8.2] - Fixed¶
- Fixed parsing TPU arguments and TPU tests (#2094) 
- Fixed number batches in case of multiple dataloaders and - limit_{*}_batches(#1920, #2226)
- Fixed an issue with forward hooks not being removed after model summary (#2298) 
- Fix for - load_from_checkpoint()not working with absolute path on Windows (#2294)
- Fixed an issue how _has_len handles - NotImplementedErrore.g. raised by- torchtext.data.Iterator(#2293), (#2307)
- Fixed - average_precisionmetric (#2319)
- Fixed ROC metric for CUDA tensors (#2304) 
- Fixed lost compatibility with custom datatypes implementing - .to(#2335)
- Fixed loading model with kwargs (#2387) 
- Fixed sum(0) for - trainer.num_val_batches(#2268)
- Fixed checking if the parameters are a - DictConfigObject (#2216)
- Fixed SLURM weights saving (#2341) 
- Fixed swaps LR scheduler order (#2356) 
- Fixed adding tensorboard - hparamslogging test (#2342)
- Fixed use model ref for tear down (#2360) 
- Fixed logger crash on DDP (#2388) 
- Fixed several issues with early stopping and checkpoint callbacks (#1504, #2391) 
- Fixed loading past checkpoints from v0.7.x (#2405) 
- Fixed loading model without arguments (#2403) 
- Fixed Windows compatibility issue (#2358) 
[0.8.1] - 2020-06-19¶
[0.8.1] - Fixed¶
[0.8.0] - 2020-06-18¶
[0.8.0] - Added¶
- Added - overfit_batches,- limit_{val|test}_batchesflags (overfit now uses training set for all three) (#2213)
- Added metrics 
- Allow dataloaders without sampler field present (#1907) 
- Added option - save_lastto save the model at the end of every epoch in- ModelCheckpoint(#1908)
- Early stopping checks - on_validation_end(#1458)
- Speed up single-core TPU training by loading data using - ParallelLoader(#2033)
- Added a model hook - transfer_batch_to_devicethat enables moving custom data structures to the target device (#1756)
- Added black formatter for the code with code-checker on pull (#1610) 
- Added back the slow spawn ddp implementation as - ddp_spawn(#2115)
- Added loading checkpoints from URLs (#1667) 
- Added a callback method - on_keyboard_interruptfor handling KeyboardInterrupt events during training (#2134)
- Added a decorator - auto_move_datathat moves data to the correct device when using the LightningModule for inference (#1905)
- Added - ckpt_pathoption to- LightningModule.test(...)to load particular checkpoint (#2190)
- Added - setupand- teardownhooks for model (#2229)
[0.8.0] - Changed¶
- Allow user to select individual TPU core to train on (#1729) 
- Removed non-finite values from loss in - LRFinder(#1862)
- Allow passing model hyperparameters as complete kwarg list (#1896) 
- Renamed - ModelCheckpoint’s attributes- bestto- best_model_scoreand- kth_best_modelto- kth_best_model_path(#1799)
- Re-Enable Logger’s - ImportErrors (#1938)
- Changed the default value of the Trainer argument - weights_summaryfrom- fullto- top(#2029)
- Raise an error when lightning replaces an existing sampler (#2020) 
- Enabled - prepare_datafrom correct processes - clarify local vs global rank (#2166)
- Remove explicit flush from tensorboard logger (#2126) 
- Changed epoch indexing from 1 instead of 0 (#2206) 
[0.8.0] - Deprecated¶
- Deprecated flags: (#2213) - overfit_pctin favour of- overfit_batches
- val_percent_checkin favour of- limit_val_batches
- test_percent_checkin favour of- limit_test_batches
 
- Deprecated - ModelCheckpoint’s attributes- bestand- kth_best_model(#1799)
- Dropped official support/testing for older PyTorch versions <1.3 (#1917) 
- Deprecated Trainer - proc_rankin favour of- global_rank(#2166, #2269)
[0.8.0] - Removed¶
- Removed unintended Trainer argument - progress_bar_callback, the callback should be passed in by- Trainer(callbacks=[...])instead (#1855)
- Removed obsolete - self._devicein Trainer (#1849)
- Removed deprecated API (#2073) - Packages: - pytorch_lightning.pt_overrides,- pytorch_lightning.root_module
- Modules: - pytorch_lightning.logging.comet_logger,- pytorch_lightning.logging.mlflow_logger,- pytorch_lightning.logging.test_tube_logger,- pytorch_lightning.overrides.override_data_parallel,- pytorch_lightning.core.model_saving,- pytorch_lightning.core.root_module
- Trainer arguments: - add_row_log_interval,- default_save_path,- gradient_clip,- nb_gpu_nodes,- max_nb_epochs,- min_nb_epochs,- nb_sanity_val_steps
- Trainer attributes: - nb_gpu_nodes,- num_gpu_nodes,- gradient_clip,- max_nb_epochs,- min_nb_epochs,- nb_sanity_val_steps,- default_save_path,- tng_tqdm_dic
 
[0.8.0] - Fixed¶
- Run graceful training teardown on interpreter exit (#1631) 
- Fixed user warning when apex was used together with learning rate schedulers (#1873) 
- Fixed multiple calls of - EarlyStoppingcallback (#1863)
- Fixed an issue with - Trainer.from_argparse_argswhen passing in unknown Trainer args (#1932)
- Fixed bug related to logger not being reset correctly for model after tuner algorithms (#1933) 
- Fixed root node resolution for SLURM cluster with dash in host name (#1954) 
- Fixed - LearningRateLoggerin multi-scheduler setting (#1944)
- Fixed test configuration check and testing (#1804) 
- Fixed an issue with Trainer constructor silently ignoring unknown/misspelled arguments (#1820) 
- Fixed - save_weights_onlyin ModelCheckpoint (#1780)
- Allow use of same - WandbLoggerinstance for multiple training loops (#2055)
- Fixed an issue with - _auto_collect_argumentscollecting local variables that are not constructor arguments and not working for signatures that have the instance not named- self(#2048)
- Fixed mistake in parameters’ grad norm tracking (#2012) 
- Fixed CPU and hanging GPU crash (#2118) 
- Fixed an issue with the model summary and - example_input_arraydepending on a specific ordering of the submodules in a LightningModule (#1773)
- Fixed Tpu logging (#2230) 
[0.7.6] - 2020-05-16¶
[0.7.6] - Added¶
- Added callback for logging learning rates (#1498) 
- Added transfer learning example (for a binary classification task in computer vision) (#1564) 
- Added type hints in - Trainer.fit()and- Trainer.test()to reflect that also a list of dataloaders can be passed in (#1723).
- Added auto scaling of batch size (#1638) 
- The progress bar metrics now also get updated in - training_epoch_end(#1724)
- Enable - NeptuneLoggerto work with- distributed_backend=ddp(#1753)
- Added option to provide seed to random generators to ensure reproducibility (#1572) 
- Added override for hparams in - load_from_ckpt(#1797)
- Added support multi-node distributed execution under - torchelastic(#1811, #1818)
- Added dummy logger for internally disabling logging for some features (#1836) 
[0.7.6] - Changed¶
- Enable - non-blockingfor device transfers to GPU (#1843)
- Replace mata_tags.csv with hparams.yaml (#1271) 
- Reduction when - batch_size < num_gpus(#1609)
- Updated LightningTemplateModel to look more like Colab example (#1577) 
- Don’t convert - namedtupleto- tuplewhen transferring the batch to target device (#1589)
- Allow passing hparams as keyword argument to LightningModule when loading from checkpoint (#1639) 
- Args should come after the last positional argument (#1807) 
- Made ddp the default if no backend specified with multiple GPUs (#1789) 
[0.7.6] - Deprecated¶
- Deprecated - tags_csvin favor of- hparams_file(#1271)
[0.7.6] - Fixed¶
- Fixed broken link in PR template (#1675) 
- Fixed ModelCheckpoint not None checking filepath (#1654) 
- Trainer now calls - on_load_checkpoint()when resuming from a checkpoint (#1666)
- Fixed sampler logic for ddp with iterable dataset (#1734) 
- Fixed - _reset_eval_dataloader()for IterableDataset (#1560)
- Fixed Horovod distributed backend to set the - root_gpuproperty (#1669)
- Fixed wandb logger - global_stepaffects other loggers (#1492)
- Fixed disabling progress bar on non-zero ranks using Horovod backend (#1709) 
- Fixed bugs that prevent lr finder to be used together with early stopping and validation dataloaders (#1676) 
- Fixed a bug in Trainer that prepended the checkpoint path with - version_when it shouldn’t (#1748)
- Fixed lr key name in case of param groups in LearningRateLogger (#1719) 
- Fixed accumulation parameter and suggestion method for learning rate finder (#1801) 
- Fixed num processes wasn’t being set properly and auto sampler was ddp failing (#1819) 
- Fixed bugs in semantic segmentation example (#1824) 
- Fixed saving native AMP scaler state (#1777) 
- Fixed native amp + ddp (#1788) 
- Fixed - hparamlogging with metrics (#1647)
[0.7.5] - 2020-04-27¶
[0.7.5] - Changed¶
- Allow logging of metrics together with - hparams(#1630)
[0.7.5] - Removed¶
- Removed Warning from trainer loop (#1634) 
[0.7.5] - Fixed¶
[0.7.4] - 2020-04-26¶
[0.7.4] - Added¶
- Added flag - replace_sampler_ddpto manually disable sampler replacement in DDP (#1513)
- Added - auto_select_gpusflag to trainer that enables automatic selection of available GPUs on exclusive mode systems.
- Added learning rate finder (#1347) 
- Added support for DDP mode in clusters without SLURM (#1387) 
- Added - test_dataloadersparameter to- Trainer.test()(#1434)
- Added - terminate_on_nanflag to trainer that performs a NaN check with each training iteration when set to- True(#1475)
- Added speed parity tests (max 1 sec difference per epoch)(#1482) 
- Added - ddp_cpubackend for testing ddp without GPUs (#1158)
- Added Horovod support as a distributed backend - Trainer(distributed_backend='horovod')(#1529)
- Added support for 8 core distributed training on Kaggle TPU’s (#1568) 
[0.7.4] - Changed¶
- Changed the default behaviour to no longer include a NaN check with each training iteration (#1475) 
- Decoupled the progress bar from trainer` it is a callback now and can be customized or even be replaced entirely (#1450). 
- Changed lr schedule step interval behavior to update every backwards pass instead of every forwards pass (#1477) 
- Defines shared proc. rank, remove rank from instances (e.g. loggers) (#1408) 
- Updated semantic segmentation example with custom U-Net and logging (#1371) 
- Disabled val and test shuffling (#1600) 
[0.7.4] - Deprecated¶
- Deprecated - training_tqdm_dictin favor of- progress_bar_dict(#1450).
[0.7.4] - Removed¶
- Removed - test_dataloadersparameter from- Trainer.fit()(#1434)
[0.7.4] - Fixed¶
- Added the possibility to pass nested metrics dictionaries to loggers (#1582) 
- Fixed memory leak from opt return (#1528) 
- Fixed saving checkpoint before deleting old ones (#1453) 
- Fixed loggers - flushing last logged metrics even before continue, e.g. - trainer.test()results (#1459)
- Fixed optimizer configuration when - configure_optimizersreturns dict without- lr_scheduler(#1443)
- Fixed - LightningModule- mixing hparams and arguments in- LightningModule.__init__()crashes load_from_checkpoint() (#1505)
- Added a missing call to the - on_before_zero_gradmodel hook (#1493).
- Allow use of sweeps with - WandbLogger(#1512)
- Fixed a bug that caused the - callbacksTrainer argument to reference a global variable (#1534).
- Fixed a bug that set all boolean CLI arguments from - Trainer.add_argparse_argsalways to True (#1571)
- Fixed do not copy the batch when training on a single GPU (#1576, #1579) 
- Fixed soft checkpoint removing on DDP (#1408) 
- Fixed automatic parser bug (#1585) 
- Fixed bool conversion from string (#1606) 
[0.7.3] - 2020-04-09¶
[0.7.3] - Added¶
- Added - rank_zero_warnfor warning only in rank 0 (#1428)
[0.7.3] - Fixed¶
[0.7.2] - 2020-04-07¶
[0.7.2] - Added¶
- Added same step loggers’ metrics aggregation (#1278) 
- Added parity test between a vanilla MNIST model and lightning model (#1284) 
- Added parity test between a vanilla RNN model and lightning model (#1351) 
- Added Reinforcement Learning - Deep Q-network (DQN) lightning example (#1232) 
- Added support for hierarchical - dict(#1152)
- Added - TrainsLoggerclass (#1122)
- Added type hints to - pytorch_lightning.core(#946)
- Added support for - IterableDatasetin validation and testing (#1104)
- Added support for non-primitive types in - hparamsfor- TensorboardLogger(#1130)
- Added a check that stops the training when loss or weights contain - NaNor- infvalues. (#1097)
- Added support for - IterableDatasetwhen- val_check_interval=1.0(default), this will trigger validation at the end of each epoch. (#1283)
- Added - summarymethod to Profilers. (#1259)
- Added informative errors if user defined dataloader has zero length (#1280) 
- Added testing for python 3.8 (#915) 
- Added model configuration checking (#1199) 
- Added support for optimizer frequencies through - LightningModule.configure_optimizers()(#1269)
- Added option to run without an optimizer by returning - Nonefrom- configure_optimizers. (#1279)
- Added a warning when the number of data loader workers is small. (#1378) 
[0.7.2] - Changed¶
- Changed (renamed and refatored) - TensorRunningMean->- TensorRunningAccum: running accumulations were generalized. (#1278)
- Changed - progress_bar_refresh_ratetrainer flag to disable progress bar when set to 0. (#1108)
- Enhanced - load_from_checkpointto also forward params to the model (#1307)
- Updated references to - self.forward()to instead use the- __call__interface. (#1211)
- Changed default behaviour of - configure_optimizersto use no optimizer rather than Adam. (#1279)
- Allow to upload models on W&B (#1339) 
- On DP and DDP2 unsqueeze is automated now (#1319) 
- Did not always create a DataLoader during reinstantiation, but the same type as before (if subclass of DataLoader) (#1346) 
- Did not interfere with a default sampler (#1318) 
- Remove default Adam optimizer (#1317) 
- Give warnings for unimplemented required lightning methods (#1317) 
- Made - evaluatemethod private >>- Trainer._evaluate(...). (#1260)
- Simplify the PL examples structure (shallower and more readable) (#1247) 
- Changed min max gpu memory to be on their own plots (#1358) 
- Remove - .itemwhich causes sync issues (#1254)
- Changed smoothing in TQDM to decrease variability of time remaining between training / eval (#1194) 
- Change default logger to dedicated one (#1064) 
[0.7.2] - Deprecated¶
[0.7.2] - Removed¶
[0.7.2] - Fixed¶
- Fixed - model_checkpointwhen saving all models (#1359)
- Trainer.add_argparse_argsclassmethod fixed. Now it adds a type for the arguments (#1147)
- Fixed bug related to type checking of - ReduceLROnPlateaulr schedulers(#1126)
- Fixed a bug to ensure lightning checkpoints to be backward compatible (#1132) 
- Fixed a bug that created an extra dataloader with active - reload_dataloaders_every_epoch(#1196)
- Fixed all warnings and errors in the docs build process (#1191) 
- Fixed an issue where - val_percent_check=0would not disable validation (#1251)
- Fixed average of incomplete - TensorRunningMean(#1309)
- Fixed - WandbLogger.watchwith- wandb.init()(#1311)
- Fixed an issue with early stopping that would prevent it from monitoring training metrics when validation is disabled / not implemented (#1235). 
- Fixed a bug that would cause - trainer.test()to run on the validation set when overloading- validation_epoch_endand- test_end(#1353)
- Fixed - WandbLogger.watch- use of the watch method without importing- wandb(#1311)
- Fixed - WandbLoggerto be used with ‘ddp’ - allow reinits in sub-processes (#1149, #1360)
- Made - training_epoch_endbehave like- validation_epoch_end(#1357)
- Fixed - fast_dev_runrunning validation twice (#1365)
- Fixed pickle error from quick patch - __code__(#1352)
- Fixed checkpointing interval (#1272) 
- Fixed validation and training loops run the partial dataset (#1192) 
- Fixed running - on_validation_endonly on main process in DDP (#1125)
- Fixed - load_spawn_weightsonly in proc rank 0 (#1385)
- Fixes using deprecated - use_ampattribute (#1145)
- Fixed Tensorboard logger error: lightning_logs directory not exists in multi-node DDP on nodes with rank != 0 (#1377) 
- Fixed - Unimplemented backend XLAerror on TPU (#1387)
[0.7.1] - 2020-03-07¶
[0.7.1] - Fixed¶
- Fixes - printissues and- data_loader(#1080)
[0.7.0] - 2020-03-06¶
[0.7.0] - Added¶
- Added automatic sampler setup. Depending on DDP or TPU, lightning configures the sampler correctly (user needs to do nothing) (#926) 
- Added - reload_dataloaders_every_epoch=Falseflag for trainer. Some users require reloading data every epoch (#926)
- Added - progress_bar_refresh_rate=50flag for trainer. Throttle refresh rate on notebooks (#926)
- Updated governance docs 
- Added a check to ensure that the metric used for early stopping exists before training commences (#542) 
- Added - optimizer_idxargument to- backwardhook (#733)
- Added - entityargument to- WandbLoggerto be passed to- wandb.init(#783)
- Added a tool for profiling training runs (#782) 
- Improved flexibility for naming of TensorBoard logs, can now set - versionto a- strto just save to that directory, and use- name=''to prevent experiment-name directory (#804)
- Added option to specify - stepkey when logging metrics (#808)
- Added - train_dataloader,- val_dataloaderand- test_dataloaderarguments to- Trainer.fit(), for alternative data parsing (#759)
- Added Tensor Processing Unit (TPU) support (#868) 
- Split callbacks in multiple files (#849) 
- Added support for multiple loggers to be passed to - Traineras an iterable (e.g. list, tuple, etc.) (#903)
- Added support for step-based learning rate scheduling (#941) 
- Added support for logging - hparamsas dict (#1029)
- Checkpoint and early stopping now work without val. step (#1041) 
- Support graceful training cleanup after Keyboard Interrupt (#856, #1019) 
- Added type hints for function arguments (#912, ) 
- Added TPU gradient clipping (#963) 
- Added max/min number of steps in - Trainer(#728)
[0.7.0] - Changed¶
- Improved - NeptuneLoggerby adding- close_after_fitargument to allow logging after training(#908)
- Changed default TQDM to use - tqdm.autofor prettier outputs in IPython notebooks (#752)
- Changed - pytorch_lightning.loggingto- pytorch_lightning.loggers(#767)
- Moved the default - tqdm_dictdefinition from Trainer to- LightningModule, so it can be overridden by the user (#749)
- Moved functionality of - LightningModule.load_from_metricsinto- LightningModule.load_from_checkpoint(#995)
- Changed Checkpoint path parameter from - filepathto- dirpath(#1016)
- Freezed models - hparamsas- Namespaceproperty (#1029)
- Dropped - loggingconfig in package init (#1015)
- Renames model steps (#1051) - training_end>>- training_epoch_end
- validation_end>>- validation_epoch_end
- test_end>>- test_epoch_end
 
- Refactor dataloading, supports infinite dataloader (#955) 
- Create single file in - TensorBoardLogger(#777)
[0.7.0] - Deprecated¶
[0.7.0] - Removed¶
[0.7.0] - Fixed¶
- Fixed a bug where early stopping - on_end_epochwould be called inconsistently when- check_val_every_n_epoch == 0(#743)
- Fixed a bug where the model checkpointer didn’t write to the same directory as the logger (#771) 
- Fixed a bug where the - TensorBoardLoggerclass would create an additional empty log file during fitting (#777)
- Fixed a bug where - global_stepwas advanced incorrectly when using- accumulate_grad_batches > 1(#832)
- Fixed a bug when calling - self.logger.experimentwith multiple loggers (#1009)
- Fixed a bug when calling - logger.append_tagson a- NeptuneLoggerwith a single tag (#1009)
- Fixed sending back data from - .spawnby saving and loading the trained model in/out of the process (#1017
- Fixed port collision on DDP (#1010) 
- Fixed/tested pass overrides (#918) 
- Fixed comet logger to log after train (#892) 
- Remove deprecated args to learning rate step function (#890) 
[0.6.0] - 2020-01-21¶
[0.6.0] - Added¶
- Added support for resuming from a specific checkpoint via - resume_from_checkpointargument (#516)
- Added support for - ReduceLROnPlateauscheduler (#320)
- Added support for Apex mode - O2in conjunction with Data Parallel (#493)
- Added option ( - save_top_k) to save the top k models in the- ModelCheckpointclass (#128)
- Added - on_train_startand- on_train_endhooks to- ModelHooks(#598)
- Added - TensorBoardLogger(#607)
- Added support for weight summary of model with multiple inputs (#543) 
- Added - map_locationargument to- load_from_metricsand- load_from_checkpoint(#625)
- Added option to disable validation by setting - val_percent_check=0(#649)
- Added - NeptuneLoggerclass (#648)
- Added - WandbLoggerclass (#627)
[0.6.0] - Changed¶
- Changed the default progress bar to print to stdout instead of stderr (#531) 
- Renamed - step_idxto- step,- epoch_idxto- epoch,- max_num_epochsto- max_epochsand- min_num_epochsto- min_epochs(#589)
- Renamed - total_batch_nbto- total_batches,- nb_val_batchesto- num_val_batches,- nb_training_batchesto- num_training_batches,- max_nb_epochsto- max_epochs,- min_nb_epochsto- min_epochs,- nb_test_batchesto- num_test_batches, and- nb_val_batchesto- num_val_batches(#567)
- Changed gradient logging to use parameter names instead of indexes (#660) 
- Changed the default logger to - TensorBoardLogger(#609)
- Changed the directory for tensorboard logging to be the same as model checkpointing (#706) 
[0.6.0] - Deprecated¶
[0.6.0] - Removed¶
- Removed the - save_best_onlyargument from- ModelCheckpoint, use- save_top_k=1instead (#128)
[0.6.0] - Fixed¶
- Fixed a bug which ocurred when using Adagrad with cuda (#554) 
- Fixed a bug where training would be on the GPU despite setting - gpus=0or- gpus=[](#561)
- Fixed an error with - print_nan_gradientswhen some parameters do not require gradient (#579)
- Fixed a bug where the progress bar would show an incorrect number of total steps during the validation sanity check when using multiple validation data loaders (#597) 
- Fixed support for PyTorch 1.1.0 (#552) 
- Fixed an issue with early stopping when using a - val_check_interval < 1.0in- Trainer(#492)
- Fixed bugs relating to the - CometLoggerobject that would cause it to not work properly (#481)
- Fixed a bug that would occur when returning - -1from- on_batch_startfollowing an early exit or when the batch was- None(#509)
- Fixed a potential race condition with several processes trying to create checkpoint directories (#530) 
- Fixed a bug where batch ‘segments’ would remain on the GPU when using - truncated_bptt > 1(#532)
- Fixed a bug when using - IterableDataset(#547)
- Fixed a bug where - .itemwas called on non-tensor objects (#602)
- Fixed a bug where - Trainer.trainwould crash on an uninitialized variable if the trainer was run after resuming from a checkpoint that was already at- max_epochs(#608)
- Fixed a bug where early stopping would begin two epochs early (#617) 
- Fixed a bug where - num_training_batchesand- num_test_batcheswould sometimes be rounded down to zero (#649)
- Fixed a bug where an additional batch would be processed when manually setting - num_training_batches(#653)
- Fixed a bug when batches did not have a - .copymethod (#701)
- Fixed a bug when using - log_gpu_memory=Truein Python 3.6 (#715)
- Fixed a bug where checkpoint writing could exit before completion, giving incomplete checkpoints (#689) 
- Fixed a bug where - on_train_endwas not called when ealy stopping (#723)
[0.5.3] - 2019-11-06¶
[0.5.3] - Added¶
- Added option to disable default logger, checkpointer, and early stopping by passing - logger=False,- checkpoint_callback=Falseand- early_stop_callback=Falserespectively
- Added - CometLoggerfor use with Comet.ml
- Added - val_check_intervalargument to- Trainerallowing validition to be performed at every given number of batches
- Added functionality to save and load hyperparameters using the standard checkpoint mechanism 
- Added call to - torch.cuda.empty_cachebefore training starts
- Added option for user to override the call t - backward
- Added support for truncated backprop through time via the - truncated_bptt_stepsargument in- Trainer
- Added option to operate on all outputs from - training_stepin DDP2
- Added a hook for modifying DDP init 
- Added a hook for modifying Apex 
[0.5.3] - Changed¶
- Changed experiment version to be padded with zeros (e.g. - /dir/version_9becomes- /dir/version_0009)
- Changed callback metrics to include any metrics given in logs or progress bar 
- Changed the default for - save_best_onlyin- ModelCheckpointto- True
- Added - tng_data_loaderfor backwards compatibility
- Renamed - MLFlowLogger.clientto- MLFlowLogger.experimentfor consistency
- Moved - global_stepincrement to happen after the batch has been processed
- Changed weights restore to first attempt HPC weights before restoring normally, preventing both weights being restored and running out of memory 
- Changed progress bar functionality to add multiple progress bars for train/val/test 
- Changed calls to - printto use- logginginstead
[0.5.3] - Deprecated¶
- Deprecated - tng_dataloader
[0.5.3] - Fixed¶
- Fixed an issue where the number of batches was off by one during training 
- Fixed a bug that occured when setting a ckeckpoint callback and - early_stop_callback=False
- Fixed an error when importing CometLogger 
- Fixed a bug where the - gpusargument had some unexpected behaviour
- Fixed a bug where the computed total number of batches was sometimes incorrect 
- Fixed a bug where the progress bar would sometimes not show the total number of batches in test mode 
- Fixed a bug when using the - log_gpu_memory='min_max'option in- Trainer
- Fixed a bug where checkpointing would sometimes erase the current directory 
[0.5.2] - 2019-10-10¶
[0.5.2] - Added¶
- Added - weights_summaryargument to- Trainerto be set to- full(full summary),- top(just top level modules) or other
- Added - tagsargument to- MLFlowLogger
[0.5.2] - Changed¶
- Changed default for - amp_levelto- O1
[0.5.2] - Removed¶
- Removed the - print_weights_summaryargument from- Trainer
[0.5.2] - Fixed¶
- Fixed a bug where logs were not written properly 
- Fixed a bug where - logger.finalizewasn’t called after training is complete
- Fixed callback metric errors in DDP 
- Fixed a bug where - TestTubeLoggerdidn’t log to the correct directory
[0.5.1] - 2019-10-05¶
[0.5.1] - Added¶
- Added the - LightningLoggerBaseclass for experiment loggers
- Added - MLFlowLoggerfor logging with- mlflow
- Added - TestTubeLoggerfor logging with- test_tube
- Added a different implementation of DDP ( - distributed_backed='ddp2') where every node has one model using all GPUs
- Added support for optimisers which require a closure (e.g. LBFGS) 
- Added automatic - MASTER_PORTdefualt for DDP when not set manually
- Added new GPU memory logging options - 'min_max'(log only the min/max utilization) and- 'all'(log all the GPU memory)
[0.5.1] - Changed¶
- Changed schedulers to always be called with the current epoch 
- Changed - test_tubeto an optional dependency
- Changed data loaders to internally use a getter instead of a python property 
- Disabled auto GPU loading when restoring weights to prevent out of memory errors 
- Changed logging, early stopping and checkpointing to occur by default 
[0.5.1] - Fixed¶
- Fixed a bug with samplers that do not specify - set_epoch
- Fixed a bug when using the - MLFlowLoggerwith unsupported data types, this will now raise a warning
- Fixed a bug where gradient norms were alwasy zero using - track_grad_norm
- Fixed a bug which causes a crash when logging memory 
[0.5.0] - 2019-09-26¶
[0.5.0] - Changed¶
- Changed - data_batchargument to- batchthroughout
- Changed - batch_iargument to- batch_idxthroughout
- Changed - tng_dataloadermethod to- train_dataloader
- Changed - on_tng_metricsmethod to- on_training_metrics
- Changed - gradient_clipargument to- gradient_clip_val
- Changed - add_log_row_intervalto- row_log_interval
[0.5.0] - Fixed¶
- Fixed a bug with tensorboard logging in multi-gpu setup 
[0.4.9] - 2019-09-16¶
[0.4.9] - Added¶
- Added the flag - log_gpu_memoryto- Trainerto deactivate logging of GPU memory utilization
- Added SLURM resubmit functionality (port from test-tube) 
- Added optional weight_save_path to trainer to remove the need for a checkpoint_callback when using cluster training 
- Added option to use single gpu per node with - DistributedDataParallel
[0.4.9] - Changed¶
- Changed functionality of - validation_endand- test_endwith multiple dataloaders to be given all of the dataloaders at once rather than in seperate calls
- Changed print_nan_grads to only print the parameter value and gradients when they contain NaN 
- Changed gpu API to take integers as well (e.g. - gpus=2instead of- gpus=[0, 1])
- All models now loaded on to CPU to avoid device and out of memory issues in PyTorch 
[0.4.9] - Fixed¶
- Fixed a bug where data types that implement - .tobut not- .cudawould not be properly moved onto the GPU
- Fixed a bug where data would not be re-shuffled every epoch when using a - DistributedSampler
[0.4.8] - 2019-08-31¶
[0.4.8] - Added¶
- Added - test_stepand- test_endmethods, used when- Trainer.testis called
- Added - GradientAccumulationSchedulercallback which can be used to schedule changes to the number of accumulation batches
- Added option to skip the validation sanity check by setting - nb_sanity_val_steps = 0
[0.4.8] - Fixed¶
- Fixed a bug when setting - nb_sanity_val_steps = 0
[0.4.7] - 2019-08-24¶
[0.4.7] - Changed¶
- Changed the default - val_check_intervalto- 1.0
- Changed defaults for - nb_val_batches,- nb_tng_batchesand- nb_test_batchesto 0
[0.4.7] - Fixed¶
- Fixed a bug where the full validation set as used despite setting - val_percent_check
- Fixed a bug where an - Exceptionwas thrown when using a data set containing a single batch
- Fixed a bug where an - Exceptionwas thrown if no- val_dataloaderwas given
- Fixed a bug where tuples were not properly transfered to the GPU 
- Fixed a bug where data of a non standard type was not properly handled by the trainer 
- Fixed a bug when loading data as a tuple 
- Fixed a bug where - AttributeErrorcould be suppressed by the- Trainer
[0.4.6] - 2019-08-15¶
[0.4.6] - Added¶
- Added support for data to be given as a - dictor- listwith a single gpu
- Added support for - configure_optimizersto return a single optimizer, two list (optimizers and schedulers), or a single list
[0.4.6] - Fixed¶
- Fixed a bug where returning just an optimizer list (i.e. without schedulers) from - configure_optimizerswould throw an- Exception
[0.4.5] - 2019-08-13¶
[0.4.5] - Added¶
- Added - optimizer_stepmethod that can be overridden to change the standard optimizer behaviour
[0.4.4] - 2019-08-12¶
[0.4.4] - Added¶
- Added supoort for multiple validation dataloaders 
- Added support for latest test-tube logger (optimised for - torch==1.2.0)
[0.4.4] - Changed¶
- validation_stepand- val_dataloaderare now optional
- lr_scheduleris now activated after epoch
[0.4.4] - Fixed¶
- Fixed a bug where a warning would show when using - lr_schedulerin- torch>1.1.0
- Fixed a bug where an - Exceptionwould be thrown if using- torch.DistributedDataParallelwithout using a- DistributedSampler, this now throws a- Warninginstead
[0.4.3] - 2019-08-10¶
[0.4.3] - Fixed¶
- Fixed a bug where accumulate gradients would scale the loss incorrectly 
[0.4.2] - 2019-08-08¶
[0.4.2] - Changed¶
- Changed install requirement to - torch==1.2.0
[0.4.1] - 2019-08-08¶
[0.4.1] - Changed¶
- Changed install requirement to - torch==1.1.0
[0.4.0] - 2019-08-08¶
[0.4.0] - Added¶
- Added 16-bit support for a single GPU 
- Added support for training continuation (preserves epoch, global step etc.) 
[0.4.0] - Changed¶
- Changed - training_stepand- validation_step, outputs will no longer be automatically reduced
[0.4.0] - Removed¶
- Removed need for - Experimentobject in- Trainer
[0.4.0] - Fixed¶
- Fixed issues with reducing outputs from generative models (such as images and text) 
[0.3.6] - 2019-07-25¶
[0.3.6] - Added¶
- Added a decorator to do lazy data loading internally 
[0.3.6] - Fixed¶
- Fixed a bug where - Experimentobject was not process safe, potentially causing logs to be overwritten