论文标题

来自嘈杂相似标记的数据的多类分类

Multi-Class Classification from Noisy-Similarity-Labeled Data

论文作者

Wu, Songhua, Xia, Xiaobo, Liu, Tongliang, Han, Bo, Gong, Mingming, Wang, Nannan, Liu, Haifeng, Niu, Gang

论文摘要

相似性标签指示两个实例是否属于同一类,而类标签显示实例的类。没有类标签,可以通过元分类学习从相似性标记的成对数据中学到多级分类器。但是,由于相似性标签的信息不如班级标签,因此更有可能是嘈杂的。深度神经网络可以很容易地记住嘈杂的数据,从而导致分类过度拟合。在本文中,我们提出了一种仅从嘈杂的标记数据中学习的方法。具体来说,为了建模噪声,我们采用噪声过渡矩阵来弥合清洁数据和嘈杂数据之间的阶级阶层概率。我们进一步估算了仅从嘈杂的数据中估算过渡矩阵,并构建了一个新颖的学习系统,以学习可以为实例分配无噪声类标签的分类器。此外,我们从理论上证明我们提出的方法如何推广学习分类器。实验结果证明了所提出的方法比基准模拟和现实世界标签数据集的最先进方法的优越性。

A similarity label indicates whether two instances belong to the same class while a class label shows the class of the instance. Without class labels, a multi-class classifier could be learned from similarity-labeled pairwise data by meta classification learning. However, since the similarity label is less informative than the class label, it is more likely to be noisy. Deep neural networks can easily remember noisy data, leading to overfitting in classification. In this paper, we propose a method for learning from only noisy-similarity-labeled data. Specifically, to model the noise, we employ a noise transition matrix to bridge the class-posterior probability between clean and noisy data. We further estimate the transition matrix from only noisy data and build a novel learning system to learn a classifier which can assign noise-free class labels for instances. Moreover, we theoretically justify how our proposed method generalizes for learning classifiers. Experimental results demonstrate the superiority of the proposed method over the state-of-the-art method on benchmark-simulated and real-world noisy-label datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源