论文标题
多标签对比度预测编码
Multi-label Contrastive Predictive Coding
论文作者
论文摘要
各变化信息(MI)估计量被广泛用于无监督的表示方法,例如对比度预测编码(CPC)。可以从多级分类问题获得MI的下限,在该问题中,批评家试图区分从$(M-1)$否定的样品中从合适的提案分布中得出的$(M-1)$负样品。使用这种方法,MI估计值以上是$ \ log M $的界限,因此,除非$ m $非常大,否则可能会严重低估。为了克服这一限制,我们基于多标签分类问题引入了一个新颖的估计器,批评家需要同时共同识别多个积极样本。我们表明,使用相同数量的负样本,多标签CPC能够超过$ \ log m $绑定,同时仍然是共同信息的有效下限。我们证明,所提出的方法能够带来更好的相互信息估计,在无监督的表示学习中获得经验改进,并在13个任务中的10个任务中击败了当前最新的知识蒸馏方法。
Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from $(m-1)$ negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by $\log m$, and could thus severely underestimate unless $m$ is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $\log m$ bound, while still being a valid lower bound of mutual information. We demonstrate that the proposed approach is able to lead to better mutual information estimation, gain empirical improvements in unsupervised representation learning, and beat a current state-of-the-art knowledge distillation method over 10 out of 13 tasks.