论文标题

白羊座:通过无标记精度估算对深神经网络进行有效测试

Aries: Efficient Testing of Deep Neural Networks via Labeling-Free Accuracy Estimation

论文作者

Hu, Qiang, Guo, Yuejun, Xie, Xiaofei, Cordy, Maxime, Ma, Lei, Papadakis, Mike, Traon, Yves Le

论文摘要

深度学习(DL)由于其在工业应用领域的竞争性能,在我们的日常生活中起着越来越重要的作用。作为支持DL的系统的核心,需要仔细评估深度神经网络(DNN),以确保生成的模型符合预期要求。实际上,评估行业中DNN质量的\ emph {de exto standard}是在收集的一组标记的测试数据中检查其性能(准确性)。但是,准备此类标记的数据通常并不容易部分,部分原因是,数据标记是劳动密集型的,尤其是每天有大量的新传入未标记的数据。最近的研究表明,DNN的测试选择是一个有希望的方向,可以通过选择最小的代表性数据来标记并使用这些数据来评估模型来解决此问题。但是,它仍然需要人类的努力,不能自动。在本文中,我们提出了一种名为\ textit {aries}的新技术,可以仅使用从原始测试数据获得的信息来估计在新未标记数据上DNN的性能。我们技术背后的关键见解是,该模型在与决策边界具有相似距离的数据上应具有相似的预测准确性。我们在两个著名的数据集(CIFAR-10和Tiny-Imagenet)上对技术进行了大规模评估,其中四种已广泛研究的DNN模型,包括Resnet101和Densenet121,以及13种类型的数据转换方法。结果表明,\ textIt {aries}的估计准确性仅为0.03 \% - 2.60 \%\%的真正精度折扣。此外,\ textit {aries}在52个情况中的50个中的50种和基于选择标记的方法中的96种中,在128个情况下,在50种情况下,还优于最先进的无标记方法。

Deep learning (DL) plays a more and more important role in our daily life due to its competitive performance in industrial application domains. As the core of DL-enabled systems, deep neural networks (DNNs) need to be carefully evaluated to ensure the produced models match the expected requirements. In practice, the \emph{de facto standard} to assess the quality of DNNs in the industry is to check their performance (accuracy) on a collected set of labeled test data. However, preparing such labeled data is often not easy partly because of the huge labeling effort, i.e., data labeling is labor-intensive, especially with the massive new incoming unlabeled data every day. Recent studies show that test selection for DNN is a promising direction that tackles this issue by selecting minimal representative data to label and using these data to assess the model. However, it still requires human effort and cannot be automatic. In this paper, we propose a novel technique, named \textit{Aries}, that can estimate the performance of DNNs on new unlabeled data using only the information obtained from the original test data. The key insight behind our technique is that the model should have similar prediction accuracy on the data which have similar distances to the decision boundary. We performed a large-scale evaluation of our technique on two famous datasets, CIFAR-10 and Tiny-ImageNet, four widely studied DNN models including ResNet101 and DenseNet121, and 13 types of data transformation methods. Results show that the estimated accuracy by \textit{Aries} is only 0.03\% -- 2.60\% off the true accuracy. Besides, \textit{Aries} also outperforms the state-of-the-art labeling-free methods in 50 out of 52 cases and selection-labeling-based methods in 96 out of 128 cases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源