论文标题
校准测试超出分类
Calibration tests beyond classification
论文作者
论文摘要
大多数监督的机器学习任务都会遵守不可还原的预测错误。概率预测模型通过提供代表对合理目标的信念而不是点估计的概率分布来解决此限制。只要模型输出具有有意义的和可解释的,这种模型可以是决策中的宝贵工具。校准模型保证概率预测既不过分也不是自信的。在机器学习文献中,已经提出并研究了不同的措施和统计检验,以评估分类模型的校准。然而,对于回归问题,研究集中在基于实现目标的分位数的校准条件下。在本文中,我们提出了统一校准评估并测试一般概率预测模型的第一个框架。它适用于任何此类模型,包括任意维度的分类和回归模型。此外,该框架概括了现有的措施,并为最近提出的多级分类校准框架提供了更直观的重新重新重新制定。特别是,我们使用标量值的内核重新重新重新调整了内核校准误差,其估计值和假设检验,并评估了实价回归问题的校准。
Most supervised machine learning tasks are subject to irreducible prediction errors. Probabilistic predictive models address this limitation by providing probability distributions that represent a belief over plausible targets, rather than point estimates. Such models can be a valuable tool in decision-making under uncertainty, provided that the model output is meaningful and interpretable. Calibrated models guarantee that the probabilistic predictions are neither over- nor under-confident. In the machine learning literature, different measures and statistical tests have been proposed and studied for evaluating the calibration of classification models. For regression problems, however, research has been focused on a weaker condition of calibration based on predicted quantiles for real-valued targets. In this paper, we propose the first framework that unifies calibration evaluation and tests for general probabilistic predictive models. It applies to any such model, including classification and regression models of arbitrary dimension. Furthermore, the framework generalizes existing measures and provides a more intuitive reformulation of a recently proposed framework for calibration in multi-class classification. In particular, we reformulate and generalize the kernel calibration error, its estimators, and hypothesis tests using scalar-valued kernels, and evaluate the calibration of real-valued regression problems.