Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Note: we move fast, but still we preserve 0.1 version (one feature release) back compatibility.


[UnReleased] - 2024-MM-DD

[UnReleased] - Added

[UnReleased] - Changed

[UnReleased] - Removed

[UnReleased] - Fixed

  • Fixed issue with shared state in metric collection when using dice score (#2848)


[1.6.0] - 2024-11-12

[1.6.0] - Added

  • Added audio metric NISQA (#2792)

  • Added classification metric LogAUC (#2377)

  • Added classification metric NegativePredictiveValue (#2433)

  • Added regression metric NormalizedRootMeanSquaredError (#2442)

  • Added segmentation metric Dice (#2725)

  • Added method merge_state to Metric (#2786)

  • Added support for propagation of the autograd graph in ddp setting (#2754)

[1.6.0] - Changed

  • Changed naming and input order arguments in KLDivergence (#2800)

[1.6.0] - Deprecated

  • Deprecated Dice from classification metrics (#2725)

[1.6.0] - Removed

  • Changed minimum supported Pytorch version to 2.0 (#2671)

  • Dropped support for Python 3.8 (#2827)

  • Removed num_outputs in R2Score (#2800)

[1.6.0] - Fixed

  • Fixed segmentation Dice + GeneralizedDice for 2d index tensors (#2832)

  • Fixed mixed results of rouge_score with accumulate='best' (#2830)


[1.5.2] - 2024-11-07

[1.5.2] - Changed

  • Re-adding numpy 2+ support (#2804)

[1.5.2] - Fixed

  • Fixed iou scores in detection for either empty predictions/targets leading to wrong scores (#2805)

  • Fixed MetricCollection compatibility with torch.jit.script (#2813)

  • Fixed assert in PIT (#2811)

  • Patched np.Inf for numpy 2.0+ (#2826)

[1.5.1] - 2024-10-22

[1.5.1] - Fixed

  • Changing _modules dict type in Pytorch 2.5 preventing to fail collections metrics (#2793)

[1.5.0] - 2024-10-18

[1.5.0] - Added

  • Added segmentation metric HausdorffDistance (#2122)

  • Added audio metric DNSMOS (#2525)

  • Added shape metric ProcrustesDistance (#2723

  • Added MetricInputTransformer wrapper (#2392)

  • Added input_format argument to segmentation metrics (#2572)

  • Added multi-output support for MAE metric (#2605)

  • Added truncation argument to BERTScore (#2776)

[1.5.0] - Changed

  • Tracker higher is better integration (#2649)

  • Updated InfoLM class to dynamically set higher_is_better (#2674)

[1.5.0] - Deprecated

  • Deprecated num_outputs in R2Score (#2705)

[1.5.0] - Fixed

  • Fixed corner case in IoU metric for single empty prediction tensors (#2780)

  • Fixed PSNR calculation for integer type input images (#2788)


[1.4.3] - 2024-10-10

[1.4.3] - Fixed

  • Fixed for Pearson changes inputs (#2765)

  • Fixed bug in PESQ metric where NoUtterancesError prevented calculating on a batch of data (#2753)

  • Fixed corner case in MatthewsCorrCoef (#2743)

[1.4.2] - 2022-09-12

[1.4.2] - Added

  • Re-adding Chrf implementation (#2701)

[1.4.2] - Fixed

  • Fixed wrong aggregation in segmentation.MeanIoU (#2698)

  • Fixed handling zero division error in binary IoU (Jaccard index) calculation (#2726)

  • Corrected the padding related calculation errors in SSIM (#2721)

  • Fixed compatibility of audio domain with new scipy (#2733)

  • Fixed how prefix/postfix works in MultitaskWrapper (#2722)

  • Fixed flakiness in tests related to torch.unique with dim=None (#2650)

[1.4.1] - 2024-08-02

[1.4.1] - Changed

  • Calculate text color of ConfusionMatrix plot based on luminance (#2590)

  • Updated _safe_divide to allow Accuracy to run on the GPU (#2640)

  • Improved error messages for intersection detection metrics for wrong user input (#2577)

[1.4.1] - Removed

  • Dropped Chrf implementation due to licensing issues with the upstream package (#2668)

[1.4.1] - Fixed

  • Fixed bug in MetricCollection when using compute groups and compute is called more than once (#2571)

  • Fixed class order of panoptic_quality(..., return_per_class=True) output (#2548)

  • Fixed BootstrapWrapper not being reset correctly (#2574)

  • Fixed integration between ClasswiseWrapper and MetricCollection with custom _filter_kwargs method (#2575)

  • Fixed BertScore calculation: pred target misalignment (#2347)

  • Fixed _cumsum helper function in multi-gpu (#2636)

  • Fixed bug in MeanAveragePrecision.coco_to_tm (#2588)

  • Fixed missed f-strings in exceptions/warnings (#2667)

[1.4.0] - 2024-05-03

[1.4.0] - Added

  • Added SensitivityAtSpecificity metric to classification subpackage (#2217)

  • Added QualityWithNoReference metric to image subpackage (#2288)

  • Added a new segmentation metric:

  • Added support for calculating segmentation quality and recognition quality in PanopticQuality metric (#2381)

  • Added pretty-errors for improving error prints (#2431)

  • Added support for torch.float weighted networks for FID and KID calculations (#2483)

  • Added zero_division argument to selected classification metrics (#2198)

[1.4.0] - Changed

  • Made __getattr__ and __setattr__ of ClasswiseWrapper more general (#2424)

[1.4.0] - Fixed

  • Fix getitem for metric collection when prefix/postfix is set (#2430)

  • Fixed axis names with Precision-Recall curve (#2462)

  • Fixed list synchronization with partly empty lists (#2468)

  • Fixed memory leak in metrics using list states (#2492)

  • Fixed bug in computation of ERGAS metric (#2498)

  • Fixed BootStrapper wrapper not working with kwargs provided argument (#2503)

  • Fixed warnings being suppressed in MeanAveragePrecision when requested (#2501)

  • Fixed corner-case in binary_average_precision when only negative samples are provided (#2507)


[1.3.2] - 2024-03-18

[1.3.2] - Fixed

  • Fixed negative variance estimates in certain image metrics (#2378)

  • Fixed dtype being changed by deepspeed for certain regression metrics (#2379)

  • Fixed plotting of metric collection when prefix/postfix is set (#2429)

  • Fixed bug when top_k>1 and average="macro" for classification metrics (#2423)

  • Fixed case where label prediction tensors in classification metrics were not validated correctly (#2427)

  • Fixed how auc scores are calculated in PrecisionRecallCurve.plot methods (#2437)

[1.3.1] - 2024-02-12

[1.3.1] - Fixed

  • Fixed how backprop is handled in LPIPS metric (#2326)

  • Fixed MultitaskWrapper not being able to be logged in lightning when using metric collections (#2349)

  • Fixed high memory consumption in Perplexity metric (#2346)

  • Fixed cached network in FeatureShare not being moved to the correct device (#2348)

  • Fix naming of statistics in MeanAveragePrecision with custom max det thresholds (#2367)

  • Fixed custom aggregation in retrieval metrics (#2364)

  • Fixed initialize aggregation metrics with default floating type (#2366)

  • Fixed plotting of confusion matrices (#2358)

[1.3.0] - 2024-01-10

[1.3.0] - Added

  • Added more tokenizers for SacreBLEU metric (#2068)

  • Added support for logging MultiTaskWrapper directly with lightnings log_dict method (#2213)

  • Added FeatureShare wrapper to share submodules containing feature extractors between metrics (#2120)

  • Added new metrics to image domain:

    • SpatialDistortionIndex (#2260)

    • Added CriticalSuccessIndex (#2257)

    • Spatial Correlation Coefficient (#2248)

  • Added average argument to multiclass versions of PrecisionRecallCurve and ROC (#2084)

  • Added confidence scores when extended_summary=True in MeanAveragePrecision (#2212)

  • Added RetrievalAUROC metric (#2251)

  • Added aggregate argument to retrieval metrics (#2220)

  • Added utility functions in segmentation.utils for future segmentation metrics (#2105)

[1.3.0] - Changed

  • Changed minimum supported Pytorch version from 1.8 to 1.10 (#2145)

  • Changed x-/y-axis order for PrecisionRecallCurve to be consistent with scikit-learn (#2183)

[1.3.0] - Deprecated

  • Deprecated metric._update_called (#2141)

  • Deprecated specicity_at_sensitivity in favour of specificity_at_sensitivity (#2199)

[1.3.0] - Fixed

  • Fixed support for half precision + CPU in metrics requiring topk operator (#2252)

  • Fixed warning incorrectly being raised in Running metrics (#2256)

  • Fixed integration with custom feature extractor in FID metric (#2277)


[1.2.1] - 2023-11-30

[1.2.1] - Added

  • Added error if NoTrainInceptionV3 is being initialized without torch-fidelity not being installed (#2143)

  • Added support for Pytorch v2.1 (#2142)

[1.2.1] - Changed

  • Change default state of SpectralAngleMapper and UniversalImageQualityIndex to be tensors (#2089)

  • Use torch range func and repeat for deterministic bincount (#2184)

[1.2.1] - Removed

  • Removed unused lpips third-party package as dependency of LearnedPerceptualImagePatchSimilarity metric (#2230)

[1.2.1] - Fixed

  • Fixed numerical stability bug in LearnedPerceptualImagePatchSimilarity metric (#2144)

  • Fixed numerical stability issue in UniversalImageQualityIndex metric (#2222)

  • Fixed incompatibility for MeanAveragePrecision with pycocotools backend when too little max_detection_thresholds are provided (#2219)

  • Fixed support for half precision in Perplexity metric (#2235)

  • Fixed device and dtype for LearnedPerceptualImagePatchSimilarity functional metric (#2234)

  • Fixed bug in Metric._reduce_states(...) when using dist_sync_fn="cat" (#2226)

  • Fixed bug in CosineSimilarity where 2d is expected but 1d input was given (#2241)

  • Fixed bug in MetricCollection when using compute groups and compute is called more than once (#2211)

[1.2.0] - 2023-09-22

[1.2.0] - Added

  • Added metric to cluster package:

    • MutualInformationScore (#2008)

    • RandScore (#2025)

    • NormalizedMutualInfoScore (#2029)

    • AdjustedRandScore (#2032)

    • CalinskiHarabaszScore (#2036)

    • DunnIndex (#2049)

    • HomogeneityScore (#2053)

    • CompletenessScore (#2053)

    • VMeasureScore (#2053)

    • FowlkesMallowsIndex (#2066)

    • AdjustedMutualInfoScore (#2058)

    • DaviesBouldinScore (#2071)

  • Added backend argument to MeanAveragePrecision (#2034)


[1.1.2] - 2023-09-11

[1.1.2] - Fixed

  • Fixed tie breaking in ndcg metric (#2031)

  • Fixed bug in BootStrapper when very few samples were evaluated that could lead to crash (#2052)

  • Fixed bug when creating multiple plots that lead to not all plots being shown (#2060)

  • Fixed performance issues in RecallAtFixedPrecision for large batch sizes (#2042)

  • Fixed bug related to MetricCollection used with custom metrics have prefix/postfix attributes (#2070)

[1.1.1] - 2023-08-29

[1.1.1] - Added

  • Added average argument to MeanAveragePrecision (#2018)

[1.1.1] - Fixed

  • Fixed bug in PearsonCorrCoef is updated on single samples at a time (#2019)

  • Fixed support for pixel-wise MSE (#2017)

  • Fixed bug in MetricCollection when used with multiple metrics that return dicts with same keys (#2027)

  • Fixed bug in detection intersection metrics when class_metrics=True resulting in wrong values (#1924)

  • Fixed missing attributes higher_is_better, is_differentiable for some metrics (#2028)

[1.1.0] - 2023-08-22

[1.1.0] - Added

  • Added source aggregated signal-to-distortion ratio (SA-SDR) metric (#1882

  • Added VisualInformationFidelity to image package (#1830)

  • Added EditDistance to text package (#1906)

  • Added top_k argument to RetrievalMRR in retrieval package (#1961)

  • Added support for evaluating "segm" and "bbox" detection in MeanAveragePrecision at the same time (#1928)

  • Added PerceptualPathLength to image package (#1939)

  • Added support for multioutput evaluation in MeanSquaredError (#1937)

  • Added argument extended_summary to MeanAveragePrecision such that precision, recall, iou can be easily returned (#1983)

  • Added warning to ClipScore if long captions are detected and truncate (#2001)

  • Added CLIPImageQualityAssessment to multimodal package (#1931)

  • Added new property metric_state to all metrics for users to investigate currently stored tensors in memory (#2006)


[1.0.3] - 2023-08-08

[1.0.3] - Added

  • Added warning to MeanAveragePrecision if too many detections are observed (#1978)

[1.0.3] - Fixed

  • Fix support for int input for when multidim_average="samplewise" in classification metrics (#1977)

  • Fixed x/y labels when plotting confusion matrices (#1976)

  • Fixed IOU compute in cuda (#1982)

[1.0.2] - 2023-08-02

[1.0.2] - Added

  • Added warning to PearsonCorrCoeff if input has a very small variance for its given dtype (#1926)

[1.0.2] - Changed

  • Changed all non-task specific classification metrics to be true subtypes of Metric (#1963)

[1.0.2] - Fixed

  • Fixed bug in CalibrationError where calculations for double precision input was performed in float precision (#1919)

  • Fixed bug related to the prefix/postfix arguments in MetricCollection and ClasswiseWrapper being duplicated (#1918)

  • Fixed missing AUC score when plotting classification metrics that support the score argument (#1948)

[1.0.1] - 2023-07-13

[1.0.1] - Fixed

  • Fixes corner case when using MetricCollection together with aggregation metrics (#1896)

  • Fixed the use of max_fpr in AUROC metric when only one class is present (#1895)

  • Fixed bug related to empty predictions for IntersectionOverUnion metric (#1892)

  • Fixed bug related to MeanMetric and broadcasting of weights when Nans are present (#1898)

  • Fixed bug related to expected input format of pycoco in MeanAveragePrecision (#1913)

[1.0.0] - 2023-07-04

[1.0.0] - Added

  • Added prefix and postfix arguments to ClasswiseWrapper (#1866)

  • Added speech-to-reverberation modulation energy ratio (SRMR) metric (#1792, #1872)

  • Added new global arg compute_with_cache to control caching behaviour after compute method (#1754)

  • Added ComplexScaleInvariantSignalNoiseRatio for audio package (#1785)

  • Added Running wrapper for calculate running statistics (#1752)

  • AddedRelativeAverageSpectralError and RootMeanSquaredErrorUsingSlidingWindow to image package (#816)

  • Added support for SpecificityAtSensitivity Metric (#1432)

  • Added support for plotting of metrics through .plot() method ( #1328, #1481, #1480, #1490, #1581, #1585, #1593, #1600, #1605, #1610, #1609, #1621, #1624, #1623, #1638, #1631, #1650, #1639, #1660, #1682, #1786, )

  • Added support for plotting of audio metrics through .plot() method (#1434)

  • Added classes to output from MAP metric (#1419)

  • Added Binary group fairness metrics to classification package (#1404)

  • Added MinkowskiDistance to regression package (#1362)

  • Added pairwise_minkowski_distance to pairwise package (#1362)

  • Added new detection metric PanopticQuality ( #929, #1527, )

  • Added PSNRB metric (#1421)

  • Added ClassificationTask Enum and use in metrics (#1479)

  • Added ignore_index option to exact_match metric (#1540)

  • Add parameter top_k to RetrievalMAP (#1501)

  • Added support for deterministic evaluation on GPU for metrics that uses torch.cumsum operator (#1499)

  • Added support for plotting of aggregation metrics through .plot() method (#1485)

  • Added support for python 3.11 (#1612)

  • Added support for auto clamping of input for metrics that uses the data_range ([#1606](argument https://github.com/Lightning-AI/metrics/pull/1606))

  • Added ModifiedPanopticQuality metric to detection package (#1627)

  • Added PrecisionAtFixedRecall metric to classification package (#1683)

  • Added multiple metrics to detection package (#1284)

    • IntersectionOverUnion

    • GeneralizedIntersectionOverUnion

    • CompleteIntersectionOverUnion

    • DistanceIntersectionOverUnion

  • Added MultitaskWrapper to wrapper package (#1762)

  • Added RelativeSquaredError metric to regression package (#1765)

  • Added MemorizationInformedFrechetInceptionDistance metric to image package (#1580)

[1.0.0] - Changed

  • Changed permutation_invariant_training to allow using a 'permutation-wise' metric function (#1794)

  • Changed update_count and update_called from private to public methods (#1370)

  • Raise exception for invalid kwargs in Metric base class (#1427)

  • Extend EnumStr raising ValueError for invalid value (#1479)

  • Improve speed and memory consumption of binned PrecisionRecallCurve with large number of samples (#1493)

  • Changed __iter__ method from raising NotImplementedError to TypeError by setting to None (#1538)

  • FID metric will now raise an error if too few samples are provided (#1655)

  • Allowed FID with torch.float64 (#1628)

  • Changed LPIPS implementation to no more rely on third-party package (#1575)

  • Changed FID matrix square root calculation from scipy to torch (#1708)

  • Changed calculation in PearsonCorrCoeff to be more robust in certain cases (#1729)

  • Changed MeanAveragePrecision to pycocotools backend (#1832)

[1.0.0] - Deprecated

[1.0.0] - Removed

  • Support for python 3.7 (#1640)

[1.0.0] - Fixed

  • Fixed support in MetricTracker for MultioutputWrapper and nested structures (#1608)

  • Fixed restrictive check in PearsonCorrCoef (#1649)

  • Fixed integration with jsonargparse and LightningCLI (#1651)

  • Fixed corner case in calibration error for zero confidence input (#1648)

  • Fix precision-recall curve based computations for float target (#1642)

  • Fixed missing kwarg squeeze in MultiOutputWrapper (#1675)

  • Fixed padding removal for 3d input in MSSSIM (#1674)

  • Fixed max_det_threshold in MAP detection (#1712)

  • Fixed states being saved in metrics that use register_buffer (#1728)

  • Fixed states not being correctly synced and device transferred in MeanAveragePrecision for iou_type="segm" (#1763)

  • Fixed use of prefix and postfix in nested MetricCollection (#1773)

  • Fixed ax plotting logging in `MetricCollection (#1783)

  • Fixed lookup for punkt sources being downloaded in RougeScore (#1789)

  • Fixed integration with lightning for CompositionalMetric (#1761)

  • Fixed several bugs in SpectralDistortionIndex metric (#1808)

  • Fixed bug for corner cases in MatthewsCorrCoef ( #1812, #1863 )

  • Fixed support for half precision in PearsonCorrCoef (#1819)

  • Fixed number of bugs related to average="macro" in classification metrics (#1821)

  • Fixed off-by-one issue when ignore_index = num_classes + 1 in Multiclass-jaccard (#1860)


[0.11.4] - 2023-03-10

[0.11.4] - Fixed

  • Fixed evaluation of R2Score with near constant target (#1576)

  • Fixed dtype conversion when metric is submodule (#1583)

  • Fixed bug related to top_k>1 and ignore_index!=None in StatScores based metrics (#1589)

  • Fixed corner case for PearsonCorrCoef when running in ddp mode but only on single device (#1587)

  • Fixed overflow error for specific cases in MAP when big areas are calculated (#1607)

[0.11.3] - 2023-02-28

[0.11.3] - Fixed

  • Fixed classification metrics for byte input (#1521)

  • Fixed the use of ignore_index in MulticlassJaccardIndex (#1386)

[0.11.2] - 2023-02-21

[0.11.2] - Fixed

  • Fixed compatibility between XLA in _bincount function (#1471)

  • Fixed type hints in methods belonging to MetricTracker wrapper (#1472)

  • Fixed multilabel in ExactMatch (#1474)

[0.11.1] - 2023-01-30

[0.11.1] - Fixed

  • Fixed type checking on the maximize parameter at the initialization of MetricTracker (#1428)

  • Fixed mixed precision autocast for SSIM metric (#1454)

  • Fixed checking for nltk.punkt in RougeScore if a machine is not online (#1456)

  • Fixed wrongly reset method in MultioutputWrapper (#1460)

  • Fixed dtype checking in PrecisionRecallCurve for target tensor (#1457)

[0.11.0] - 2022-11-30

[0.11.0] - Added

  • Added MulticlassExactMatch to classification metrics (#1343)

  • Added TotalVariation to image package (#978)

  • Added CLIPScore to new multimodal package (#1314)

  • Added regression metrics:

    • KendallRankCorrCoef (#1271)

    • LogCoshError (#1316)

  • Added new nominal metrics:

  • Added option to pass distributed_available_fn to metrics to allow checks for custom communication backend for making dist_sync_fn actually useful (#1301)

  • Added normalize argument to Inception, FID, KID metrics (#1246)

[0.11.0] - Changed

  • Changed minimum Pytorch version to be 1.8 (#1263)

  • Changed interface for all functional and modular classification metrics after refactor (#1252)

[0.11.0] - Removed

  • Removed deprecated BinnedAveragePrecision, BinnedPrecisionRecallCurve, RecallAtFixedPrecision (#1251)

  • Removed deprecated LabelRankingAveragePrecision, LabelRankingLoss and CoverageError (#1251)

  • Removed deprecated KLDivergence and AUC (#1251)

[0.11.0] - Fixed

  • Fixed precision bug in pairwise_euclidean_distance (#1352)


[0.10.3] - 2022-11-16

[0.10.3] - Fixed

  • Fixed bug in Metrictracker.best_metric when return_step=False (#1306)

  • Fixed bug to prevent users from going into an infinite loop if trying to iterate of a single metric (#1320)

[0.10.2] - 2022-10-31

[0.10.2] - Changed

  • Changed in-place operation to out-of-place operation in pairwise_cosine_similarity (#1288)

[0.10.2] - Fixed

  • Fixed high memory usage for certain classification metrics when average='micro' (#1286)

  • Fixed precision problems when structural_similarity_index_measure was used with autocast (#1291)

  • Fixed slow performance for confusion matrix based metrics (#1302)

  • Fixed restrictive dtype checking in spearman_corrcoef when used with autocast (#1303)

[0.10.1] - 2022-10-21

[0.10.1] - Fixed

  • Fixed broken clone method for classification metrics (#1250)

  • Fixed unintentional downloading of nltk.punkt when lsum not in rouge_keys (#1258)

  • Fixed type casting in MAP metric between bool and float32 (#1150)

[0.10.0] - 2022-10-04

[0.10.0] - Added

  • Added a new NLP metric InfoLM (#915)

  • Added Perplexity metric (#922)

  • Added ConcordanceCorrCoef metric to regression package (#1201)

  • Added argument normalize to LPIPS metric (#1216)

  • Added support for multiprocessing of batches in PESQ metric (#1227)

  • Added support for multioutput in PearsonCorrCoef and SpearmanCorrCoef (#1200)

[0.10.0] - Changed

[0.10.0] - Deprecated

  • Deprecated BinnedAveragePrecision, BinnedPrecisionRecallCurve, BinnedRecallAtFixedPrecision (#1163)

    • BinnedAveragePrecision -> use AveragePrecision with thresholds arg

    • BinnedPrecisionRecallCurve -> use AveragePrecisionRecallCurve with thresholds arg

    • BinnedRecallAtFixedPrecision -> use RecallAtFixedPrecision with thresholds arg

  • Renamed and refactored LabelRankingAveragePrecision, LabelRankingLoss and CoverageError (#1167)

    • LabelRankingAveragePrecision -> MultilabelRankingAveragePrecision

    • LabelRankingLoss -> MultilabelRankingLoss

    • CoverageError -> MultilabelCoverageError

  • Deprecated KLDivergence and AUC from classification package (#1189)

    • KLDivergence moved to regression package

    • Instead of AUC use torchmetrics.utils.compute.auc

[0.10.0] - Fixed

  • Fixed a bug in ssim when return_full_image=True where the score was still reduced (#1204)

  • Fixed MPS support for:

  • Fixed bug in ClasswiseWrapper such that compute gave wrong result (#1225)

  • Fixed synchronization of empty list states (#1219)


[0.9.3] - 2022-08-22

[0.9.3] - Added

  • Added global option sync_on_compute to disable automatic synchronization when compute is called (#1107)

[0.9.3] - Fixed

  • Fixed missing reset in ClasswiseWrapper (#1129)

  • Fixed JaccardIndex multi-label compute (#1125)

  • Fix SSIM propagate device if gaussian_kernel is False, add test (#1149)

[0.9.2] - 2022-06-29

[0.9.2] - Fixed

  • Fixed mAP calculation for areas with 0 predictions (#1080)

  • Fixed bug where avg precision state and auroc state was not merge when using MetricCollections (#1086)

  • Skip box conversion if no boxes are present in MeanAveragePrecision (#1097)

  • Fixed inconsistency in docs and code when setting average="none" in AveragePrecision metric (#1116)

[0.9.1] - 2022-06-08

[0.9.1] - Added

  • Added specific RuntimeError when metric object is on the wrong device (#1056)

  • Added an option to specify own n-gram weights for BLEUScore and SacreBLEUScore instead of using uniform weights only. (#1075)

[0.9.1] - Fixed

  • Fixed aggregation metrics when input only contains zero (#1070)

  • Fixed TypeError when providing superclass arguments as kwargs (#1069)

  • Fixed bug related to state reference in metric collection when using compute groups (#1076)

[0.9.0] - 2022-05-30

[0.9.0] - Added

  • Added RetrievalPrecisionRecallCurve and RetrievalRecallAtFixedPrecision to retrieval package (#951)

  • Added class property full_state_update that determines forward should call update once or twice ( #984, #1033)

  • Added support for nested metric collections (#1003)

  • Added Dice to classification package (#1021)

  • Added support to segmentation type segm as IOU for mean average precision (#822)

[0.9.0] - Changed

  • Renamed reduction argument to average in Jaccard score and added additional options (#874)

[0.9.0] - Removed

[0.9.0] - Fixed

  • Fixed non-empty state dict for a few metrics (#1012)

  • Fixed bug when comparing states while finding compute groups (#1022)

  • Fixed torch.double support in stat score metrics (#1023)

  • Fixed FID calculation for non-equal size real and fake input (#1028)

  • Fixed case where KLDivergence could output Nan (#1030)

  • Fixed deterministic for PyTorch<1.8 (#1035)

  • Fixed default value for mdmc_average in Accuracy (#1036)

  • Fixed missing copy of property when using compute groups in MetricCollection (#1052)


[0.8.2] - 2022-05-06

[0.8.2] - Fixed

  • Fixed multi device aggregation in PearsonCorrCoef (#998)

  • Fixed MAP metric when using custom list of thresholds (#995)

  • Fixed compatibility between compute groups in MetricCollection and prefix/postfix arg (#1007)

  • Fixed compatibility with future Pytorch 1.12 in safe_matmul (#1011, #1014)

[0.8.1] - 2022-04-27

[0.8.1] - Changed

  • Reimplemented the signal_distortion_ratio metric, which removed the absolute requirement of fast-bss-eval (#964)

[0.8.1] - Fixed

  • Fixed “Sort currently does not support bool dtype on CUDA” error in MAP for empty preds (#983)

  • Fixed BinnedPrecisionRecallCurve when thresholds argument is not provided (#968)

  • Fixed CalibrationError to work on logit input (#985)

[0.8.0] - 2022-04-14

[0.8.0] - Added

  • Added WeightedMeanAbsolutePercentageError to regression package (#948)

  • Added new classification metrics:

    • CoverageError (#787)

    • LabelRankingAveragePrecision and LabelRankingLoss (#787)

  • Added new image metric:

    • SpectralAngleMapper (#885)

    • ErrorRelativeGlobalDimensionlessSynthesis (#894)

    • UniversalImageQualityIndex (#824)

    • SpectralDistortionIndex (#873)

  • Added support for MetricCollection in MetricTracker (#718)

  • Added support for 3D image and uniform kernel in StructuralSimilarityIndexMeasure (#818)

  • Added smart update of MetricCollection (#709)

  • Added ClasswiseWrapper for better logging of classification metrics with multiple output values (#832)

  • Added **kwargs argument for passing additional arguments to base class (#833)

  • Added negative ignore_index for the Accuracy metric (#362)

  • Added adaptive_k for the RetrievalPrecision metric (#910)

  • Added reset_real_features argument image quality assessment metrics (#722)

  • Added new keyword argument compute_on_cpu to all metrics (#867)

[0.8.0] - Changed

  • Made num_classes in jaccard_index a required argument (#853, #914)

  • Added normalizer, tokenizer to ROUGE metric (#838)

  • Improved shape checking of permutation_invariant_training (#864)

  • Allowed reduction None (#891)

  • MetricTracker.best_metric will now give a warning when computing on metric that do not have a best (#913)

[0.8.0] - Deprecated

  • Deprecated argument compute_on_step (#792)

  • Deprecated passing in dist_sync_on_step, process_group, dist_sync_fn direct argument (#833)

[0.8.0] - Removed

  • Removed support for versions of Pytorch-Lightning lower than v1.5 (#788)

  • Removed deprecated functions, and warnings in Text (#773)

    • WER and functional.wer

  • Removed deprecated functions and warnings in Image (#796)

    • SSIM and functional.ssim

    • PSNR and functional.psnr

  • Removed deprecated functions, and warnings in classification and regression (#806)

    • FBeta and functional.fbeta

    • F1 and functional.f1

    • Hinge and functional.hinge

    • IoU and functional.iou

    • MatthewsCorrcoef

    • PearsonCorrcoef

    • SpearmanCorrcoef

  • Removed deprecated functions, and warnings in detection and pairwise (#804)

    • MAP and functional.pairwise.manhatten

  • Removed deprecated functions, and warnings in Audio (#805)

    • PESQ and functional.audio.pesq

    • PIT and functional.audio.pit

    • SDR and functional.audio.sdr and functional.audio.si_sdr

    • SNR and functional.audio.snr and functional.audio.si_snr

    • STOI and functional.audio.stoi

  • Removed unused get_num_classes from torchmetrics.utilities.data (#914)

[0.8.0] - Fixed

  • Fixed device mismatch for MAP metric in specific cases (#950)

  • Improved testing speed (#820)

  • Fixed compatibility of ClasswiseWrapper with the prefix argument of MetricCollection (#843)

  • Fixed BestScore on GPU (#912)

  • Fixed Lsum computation for ROUGEScore (#944)


[0.7.3] - 2022-03-23

[0.7.3] - Fixed

  • Fixed unsafe log operation in TweedieDeviace for power=1 (#847)

  • Fixed bug in MAP metric related to either no ground truth or no predictions (#884)

  • Fixed ConfusionMatrix, AUROC and AveragePrecision on GPU when running in deterministic mode (#900)

  • Fixed NaN or Inf results returned by signal_distortion_ratio (#899)

  • Fixed memory leak when using update method with tensor where requires_grad=True (#902)

[0.7.2] - 2022-02-10

[0.7.2] - Fixed

  • Minor patches in JOSS paper.

[0.7.1] - 2022-02-03

[0.7.1] - Changed

  • Used torch.bucketize in calibration error when torch>1.8 for faster computations (#769)

  • Improve mAP performance (#742)

[0.7.1] - Fixed

  • Fixed check for available modules (#772)

  • Fixed Matthews correlation coefficient when the denominator is 0 (#781)

[0.7.0] - 2022-01-17

[0.7.0] - Added

  • Added NLP metrics:

    • MatchErrorRate (#619)

    • WordInfoLost and WordInfoPreserved (#630)

    • SQuAD (#623)

    • CHRFScore (#641)

    • TranslationEditRate (#646)

    • ExtendedEditDistance (#668)

  • Added MultiScaleSSIM into image metrics (#679)

  • Added Signal to Distortion Ratio (SDR) to audio package (#565)

  • Added MinMaxMetric to wrappers (#556)

  • Added ignore_index to retrieval metrics (#676)

  • Added support for multi references in ROUGEScore (#680)

  • Added a default VSCode devcontainer configuration (#621)

[0.7.0] - Changed

  • Scalar metrics will now consistently have additional dimensions squeezed (#622)

  • Metrics having third party dependencies removed from global import (#463)

  • Untokenized for BLEUScore input stay consistent with all the other text metrics (#640)

  • Arguments reordered for TER, BLEUScore, SacreBLEUScore, CHRFScore now expect input order as predictions first and target second (#696)

  • Changed dtype of metric state from torch.float to torch.long in ConfusionMatrix to accommodate larger values (#715)

  • Unify preds, target input argument’s naming across all text metrics (#723, #727)

    • bert, bleu, chrf, sacre_bleu, wip, wil, cer, ter, wer, mer, rouge, squad

[0.7.0] - Deprecated

  • Renamed IoU -> Jaccard Index (#662)

  • Renamed text WER metric (#714)

    • functional.wer -> functional.word_error_rate

    • WER -> WordErrorRate

  • Renamed correlation coefficient classes: (#710)

    • MatthewsCorrcoef -> MatthewsCorrCoef

    • PearsonCorrcoef -> PearsonCorrCoef

    • SpearmanCorrcoef -> SpearmanCorrCoef

  • Renamed audio STOI metric: (#753, #758)

    • audio.STOI to audio.ShortTimeObjectiveIntelligibility

    • functional.audio.stoi to functional.audio.short_time_objective_intelligibility

  • Renamed audio PESQ metrics: (#751)

    • functional.audio.pesq -> functional.audio.perceptual_evaluation_speech_quality

    • audio.PESQ -> audio.PerceptualEvaluationSpeechQuality

  • Renamed audio SDR metrics: (#711)

    • functional.sdr -> functional.signal_distortion_ratio

    • functional.si_sdr -> functional.scale_invariant_signal_distortion_ratio

    • SDR -> SignalDistortionRatio

    • SI_SDR -> ScaleInvariantSignalDistortionRatio

  • Renamed audio SNR metrics: (#712)

    • functional.snr -> functional.signal_distortion_ratio

    • functional.si_snr -> functional.scale_invariant_signal_noise_ratio

    • SNR -> SignalNoiseRatio

    • SI_SNR -> ScaleInvariantSignalNoiseRatio

  • Renamed F-score metrics: (#731, #740)

    • functional.f1 -> functional.f1_score

    • F1 -> F1Score

    • functional.fbeta -> functional.fbeta_score

    • FBeta -> FBetaScore

  • Renamed Hinge metric: (#734)

    • functional.hinge -> functional.hinge_loss

    • Hinge -> HingeLoss

  • Renamed image PSNR metrics (#732)

    • functional.psnr -> functional.peak_signal_noise_ratio

    • PSNR -> PeakSignalNoiseRatio

  • Renamed image PIT metric: (#737)

    • functional.pit -> functional.permutation_invariant_training

    • PIT -> PermutationInvariantTraining

  • Renamed image SSIM metric: (#747)

    • functional.ssim -> functional.scale_invariant_signal_noise_ratio

    • SSIM -> StructuralSimilarityIndexMeasure

  • Renamed detection MAP to MeanAveragePrecision metric (#754)

  • Renamed Fidelity & LPIPS image metric: (#752)

    • image.FID -> image.FrechetInceptionDistance

    • image.KID -> image.KernelInceptionDistance

    • image.LPIPS -> image.LearnedPerceptualImagePatchSimilarity

[0.7.0] - Removed

  • Removed embedding_similarity metric (#638)

  • Removed argument concatenate_texts from wer metric (#638)

  • Removed arguments newline_sep and decimal_places from rouge metric (#638)

[0.7.0] - Fixed

  • Fixed MetricCollection kwargs filtering when no kwargs are present in update signature (#707)


[0.6.2] - 2021-12-15

[0.6.2] - Fixed

  • Fixed torch.sort currently does not support bool dtype on CUDA (#665)

  • Fixed mAP properly checks if ground truths are empty (#684)

  • Fixed initialization of tensors to be on correct device for MAP metric (#673)

[0.6.1] - 2021-12-06

[0.6.1] - Changed

  • Migrate MAP metrics from pycocotools to PyTorch (#632)

  • Use torch.topk instead of torch.argsort in retrieval precision for speedup (#627)

[0.6.1] - Fixed

  • Fix empty predictions in MAP metric (#594, #610, #624)

  • Fix edge case of AUROC with average=weighted on GPU (#606)

  • Fixed forward in compositional metrics (#645)

[0.6.0] - 2021-10-28

[0.6.0] - Added

  • Added audio metrics:

    • Perceptual Evaluation of Speech Quality (PESQ) (#353)

    • Short-Time Objective Intelligibility (STOI) (#353)

  • Added Information retrieval metrics:

    • RetrievalRPrecision (#577)

    • RetrievalHitRate (#576)

  • Added NLP metrics:

    • SacreBLEUScore (#546)

    • CharErrorRate (#575)

  • Added other metrics:

    • Tweedie Deviance Score (#499)

    • Learned Perceptual Image Patch Similarity (LPIPS) (#431)

  • Added MAP (mean average precision) metric to new detection package (#467)

  • Added support for float targets in nDCG metric (#437)

  • Added average argument to AveragePrecision metric for reducing multi-label and multi-class problems (#477)

  • Added MultioutputWrapper (#510)

  • Added metric sweeping:

    • higher_is_better as constant attribute (#544)

    • higher_is_better to rest of codebase (#584)

  • Added simple aggregation metrics: SumMetric, MeanMetric, CatMetric, MinMetric, MaxMetric (#506)

  • Added pairwise submodule with metrics (#553)

    • pairwise_cosine_similarity

    • pairwise_euclidean_distance

    • pairwise_linear_similarity

    • pairwise_manhatten_distance

[0.6.0] - Changed

  • AveragePrecision will now as default output the macro average for multilabel and multiclass problems (#477)

  • half, double, float will no longer change the dtype of the metric states. Use metric.set_dtype instead (#493)

  • Renamed AverageMeter to MeanMetric (#506)

  • Changed is_differentiable from property to a constant attribute (#551)

  • ROC and AUROC will no longer throw an error when either the positive or negative class is missing. Instead return 0 score and give a warning

[0.6.0] - Deprecated

  • Deprecated functional.self_supervised.embedding_similarity in favour of new pairwise submodule

[0.6.0] - Removed

  • Removed dtype property (#493)

[0.6.0] - Fixed

  • Fixed bug in F1 with average='macro' and ignore_index!=None (#495)

  • Fixed bug in pit by using the returned first result to initialize device and type (#533)

  • Fixed SSIM metric using too much memory (#539)

  • Fixed bug where device property was not properly update when metric was a child of a module (#542)


[0.5.1] - 2021-08-30

[0.5.1] - Added

  • Added device and dtype properties (#462)

  • Added TextTester class for robustly testing text metrics (#450)

[0.5.1] - Changed

  • Added support for float targets in nDCG metric (#437)

[0.5.1] - Removed

  • Removed rouge-score as dependency for text package (#443)

  • Removed jiwer as dependency for text package (#446)

  • Removed bert-score as dependency for text package (#473)

[0.5.1] - Fixed

  • Fixed ranking of samples in SpearmanCorrCoef metric (#448)

  • Fixed bug where compositional metrics where unable to sync because of type mismatch (#454)

  • Fixed metric hashing (#478)

  • Fixed BootStrapper metrics not working on GPU (#462)

  • Fixed the semantic ordering of kernel height and width in SSIM metric (#474)

[0.5.0] - 2021-08-09

[0.5.0] - Added

  • Added Text-related (NLP) metrics:

  • Added MetricTracker wrapper metric for keeping track of the same metric over multiple epochs (#238)

  • Added other metrics:

    • Symmetric Mean Absolute Percentage error (SMAPE) (#375)

    • Calibration error (#394)

    • Permutation Invariant Training (PIT) (#384)

  • Added support in nDCG metric for target with values larger than 1 (#349)

  • Added support for negative targets in nDCG metric (#378)

  • Added None as reduction option in CosineSimilarity metric (#400)

  • Allowed passing labels in (n_samples, n_classes) to AveragePrecision (#386)

[0.5.0] - Changed

  • Moved psnr and ssim from functional.regression.* to functional.image.* (#382)

  • Moved image_gradient from functional.image_gradients to functional.image.gradients (#381)

  • Moved R2Score from regression.r2score to regression.r2 (#371)

  • Pearson metric now only store 6 statistics instead of all predictions and targets (#380)

  • Use torch.argmax instead of torch.topk when k=1 for better performance (#419)

  • Moved check for number of samples in R2 score to support single sample updating (#426)

[0.5.0] - Deprecated

  • Rename r2score >> r2_score and kldivergence >> kl_divergence in functional (#371)

  • Moved bleu_score from functional.nlp to functional.text.bleu (#360)

[0.5.0] - Removed

  • Removed restriction that threshold has to be in (0,1) range to support logit input ( #351 #401)

  • Removed restriction that preds could not be bigger than num_classes to support logit input (#357)

  • Removed module regression.psnr and regression.ssim (#382):

  • Removed (#379):

    • function functional.mean_relative_error

    • num_thresholds argument in BinnedPrecisionRecallCurve

[0.5.0] - Fixed

  • Fixed bug where classification metrics with average='macro' would lead to wrong result if a class was missing (#303)

  • Fixed weighted, multi-class AUROC computation to allow for 0 observations of some class, as contribution to final AUROC is 0 (#376)

  • Fixed that _forward_cache and _computed attributes are also moved to the correct device if metric is moved (#413)

  • Fixed calculation in IoU metric when using ignore_index argument (#328)


[0.4.1] - 2021-07-05

[0.4.1] - Changed

[0.4.1] - Fixed

  • Fixed DDP by is_sync logic to Metric (#339)

[0.4.0] - 2021-06-29

[0.4.0] - Added

  • Added Image-related metrics:

    • Fréchet inception distance (FID) (#213)

    • Kernel Inception Distance (KID) (#301)

    • Inception Score (#299)

    • KL divergence (#247)

  • Added Audio metrics: SNR, SI_SDR, SI_SNR (#292)

  • Added other metrics:

    • Cosine Similarity (#305)

    • Specificity (#210)

    • Mean Absolute Percentage error (MAPE) (#248)

  • Added add_metrics method to MetricCollection for adding additional metrics after initialization (#221)

  • Added pre-gather reduction in the case of dist_reduce_fx="cat" to reduce communication cost (#217)

  • Added better error message for AUROC when num_classes is not provided for multiclass input (#244)

  • Added support for unnormalized scores (e.g. logits) in Accuracy, Precision, Recall, FBeta, F1, StatScore, Hamming, ConfusionMatrix metrics (#200)

  • Added squared argument to MeanSquaredError for computing RMSE (#249)

  • Added is_differentiable property to ConfusionMatrix, F1, FBeta, Hamming, Hinge, IOU, MatthewsCorrcoef, Precision, Recall, PrecisionRecallCurve, ROC, StatScores (#253)

  • Added sync and sync_context methods for manually controlling when metric states are synced (#302)

[0.4.0] - Changed

  • Forward cache is reset when reset method is called (#260)

  • Improved per-class metric handling for imbalanced datasets for precision, recall, precision_recall, fbeta, f1, accuracy, and specificity (#204)

  • Decorated torch.jit.unused to MetricCollection forward (#307)

  • Renamed thresholds argument to binned metrics for manually controlling the thresholds (#322)

  • Extend typing (#324, #326, #327)

[0.4.0] - Deprecated

  • Deprecated functional.mean_relative_error, use functional.mean_absolute_percentage_error (#248)

  • Deprecated num_thresholds argument in BinnedPrecisionRecallCurve (#322)

[0.4.0] - Removed

  • Removed argument is_multiclass (#319)

[0.4.0] - Fixed

  • AUC can also support more dimensional inputs when all but one dimension are of size 1 (#242)

  • Fixed dtype of modular metrics after reset has been called (#243)

  • Fixed calculation in matthews_corrcoef to correctly match formula (#321)


[0.3.2] - 2021-05-10

[0.3.2] - Added

  • Added is_differentiable property:

    • To AUC, AUROC, CohenKappa and AveragePrecision (#178)

    • To PearsonCorrCoef, SpearmanCorrcoef, R2Score and ExplainedVariance (#225)

[0.3.2] - Changed

  • MetricCollection should return metrics with prefix on items(), keys() (#209)

  • Calling compute before update will now give warning (#164)

[0.3.2] - Removed

  • Removed numpy as direct dependency (#212)

[0.3.2] - Fixed

  • Fixed auc calculation and add tests (#197)

  • Fixed loading persisted metric states using load_state_dict() (#202)

  • Fixed PSNR not working with DDP (#214)

  • Fixed metric calculation with unequal batch sizes (#220)

  • Fixed metric concatenation for list states for zero-dim input (#229)

  • Fixed numerical instability in AUROC metric for large input (#230)

[0.3.1] - 2021-04-21

  • Cleaning remaining inconsistency and fix PL develop integration ( #191, #192, #193, #194 )

[0.3.0] - 2021-04-20

[0.3.0] - Added

  • Added BootStrapper to easily calculate confidence intervals for metrics (#101)

  • Added Binned metrics (#128)

  • Added metrics for Information Retrieval ((PL^5032)):

    • RetrievalMAP (PL^5032)

    • RetrievalMRR (#119)

    • RetrievalPrecision (#139)

    • RetrievalRecall (#146)

    • RetrievalNormalizedDCG (#160)

    • RetrievalFallOut (#161)

  • Added other metrics:

    • CohenKappa (#69)

    • MatthewsCorrcoef (#98)

    • PearsonCorrcoef (#157)

    • SpearmanCorrcoef (#158)

    • Hinge (#120)

  • Added average='micro' as an option in AUROC for multilabel problems (#110)

  • Added multilabel support to ROC metric (#114)

  • Added testing for half precision (#77, #135 )

  • Added AverageMeter for ad-hoc averages of values (#138)

  • Added prefix argument to MetricCollection (#70)

  • Added __getitem__ as metric arithmetic operation (#142)

  • Added property is_differentiable to metrics and test for differentiability (#154)

  • Added support for average, ignore_index and mdmc_average in Accuracy metric (#166)

  • Added postfix arg to MetricCollection (#188)

[0.3.0] - Changed

  • Changed ExplainedVariance from storing all preds/targets to tracking 5 statistics (#68)

  • Changed behaviour of confusionmatrix for multilabel data to better match multilabel_confusion_matrix from sklearn (#134)

  • Updated FBeta arguments (#111)

  • Changed reset method to use detach.clone() instead of deepcopy when resetting to default (#163)

  • Metrics passed as dict to MetricCollection will now always be in deterministic order (#173)

  • Allowed MetricCollection pass metrics as arguments (#176)

[0.3.0] - Deprecated

  • Rename argument is_multiclass -> multiclass (#162)

[0.3.0] - Removed

  • Prune remaining deprecated (#92)

[0.3.0] - Fixed

  • Fixed when _stable_1d_sort to work when n>=N (PL^6177)

  • Fixed _computed attribute not being correctly reset (#147)

  • Fixed to Blau score (#165)

  • Fixed backwards compatibility for logging with older version of pytorch-lightning (#182)


[0.2.0] - 2021-03-12

[0.2.0] - Changed

  • Decoupled PL dependency (#13)

  • Refactored functional - mimic the module-like structure: classification, regression, etc. (#16)

  • Refactored utilities - split to topics/submodules (#14)

  • Refactored MetricCollection (#19)

[0.2.0] - Removed

  • Removed deprecated metrics from PL base (#12, #15)


[0.1.0] - 2021-02-22

  • Added Accuracy metric now generalizes to Top-k accuracy for (multi-dimensional) multi-class inputs using the top_k parameter (PL^4838)

  • Added Accuracy metric now enables the computation of subset accuracy for multi-label or multi-dimensional multi-class inputs with the subset_accuracy parameter (PL^4838)

  • Added HammingDistance metric to compute the hamming distance (loss) (PL^4838)

  • Added StatScores metric to compute the number of true positives, false positives, true negatives and false negatives (PL^4839)

  • Added R2Score metric (PL^5241)

  • Added MetricCollection (PL^4318)

  • Added .clone() method to metrics (PL^4318)

  • Added IoU class interface (PL^4704)

  • The Recall and Precision metrics (and their functional counterparts recall and precision) can now be generalized to Recall@K and Precision@K with the use of top_k parameter (PL^4842)

  • Added compositional metrics (PL^5464)

  • Added AUC/AUROC class interface (PL^5479)

  • Added QuantizationAwareTraining callback (PL^5706)

  • Added ConfusionMatrix class interface (PL^4348)

  • Added multiclass AUROC metric (PL^4236)

  • Added PrecisionRecallCurve, ROC, AveragePrecision class metric (PL^4549)

  • Classification metrics overhaul (PL^4837)

  • Added F1 class metric (PL^4656)

  • Added metrics aggregation in Horovod and fixed early stopping (PL^3775)

  • Added persistent(mode) method to metrics, to enable and disable metric states being added to state_dict (PL^4482)

  • Added unification of regression metrics (PL^4166)

  • Added persistent flag to Metric.add_state (PL^4195)

  • Added classification metrics (PL^4043)

  • Added new Metrics API. (PL^3868, PL^3921)

  • Added EMB similarity (PL^3349)

  • Added SSIM metrics (PL^2671)

  • Added BLEU metrics (PL^2535)