监督学习：没有损失没有哭泣

论文标题

监督学习：没有损失没有哭泣

Supervised Learning: No Loss No Cry

论文作者

Nock, Richard, Menon, Aditya Krishna

论文摘要

监督学习需要规范损失功能才能最小化。虽然从计算和统计角度来看，可接受损失的理论既发达又发达，但这些理论提供了不同选择的套餐。实际上，此选择通常是以\ emph {ad hoc}方式做出的。为了使此过程更有原则性，\ emph {学习损失函数}的问题（例如，分类）引起了最近的兴趣。但是，在这一领域的工作通常是经验性的。在本文中，我们重新访问了Kakade等人的{\ sc slisotron}算法。（2011）通过一种新型的镜头，得出基于布雷格曼分歧的概括，并展示了它如何提供学习损失的原则过程。详细说明，我们将{\ sc slisotron}作为从复合方形损失的家族中学习的损失。通过通过\ emph {适当损失}的镜头来解释这一点，我们基于Bregman Divergences得出了{\ sc slisotron}的概括。所得的{\ sc bregmantron}算法与分类器共同学习损失。它配备了简单的融合保证，可以为其所学到的损失提供，其一组可能的输出均由贝叶斯规则的不可知论近似性保证。实验表明，{\ sc bregmantron}基本上要比{\ sc slisotron}大大优于{\ sc slisotron}，并且可以通过其他算法将其所学的损失用于不同任务，从而打开了域之间有趣的\ textit {损失传递}的有趣问题。

Supervised learning requires the specification of a loss function to minimise. While the theory of admissible losses from both a computational and statistical perspective is well-developed, these offer a panoply of different choices. In practice, this choice is typically made in an \emph{ad hoc} manner. In hopes of making this procedure more principled, the problem of \emph{learning the loss function} for a downstream task (e.g., classification) has garnered recent interest. However, works in this area have been generally empirical in nature. In this paper, we revisit the {\sc SLIsotron} algorithm of Kakade et al. (2011) through a novel lens, derive a generalisation based on Bregman divergences, and show how it provides a principled procedure for learning the loss. In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses. By interpreting this through the lens of \emph{proper losses}, we derive a generalisation of {\sc SLIsotron} based on Bregman divergences. The resulting {\sc BregmanTron} algorithm jointly learns the loss along with the classifier. It comes equipped with a simple guarantee of convergence for the loss it learns, and its set of possible outputs comes with a guarantee of agnostic approximability of Bayes rule. Experiments indicate that the {\sc BregmanTron} substantially outperforms the {\sc SLIsotron}, and that the loss it learns can be minimized by other algorithms for different tasks, thereby opening the interesting problem of \textit{loss transfer} between domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题