论文标题
超越超越:在概念漂移存在下重新审视恶意软件分类
Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift
论文作者
论文摘要
恶意软件分类的机器学习显示出令人鼓舞的结果,但是随着恶意软件作者的适应技术,实际部署会遭受性能退化的损失。这种现象被称为概念漂移,是随着新恶意软件的发展而发生的,并且变得越来越不像原始培训示例。应对概念漂移的一种有前途的方法是与拒绝分类,在这种拒绝中,将可能被错误分类的示例隔离,直到对它们进行专业分析为止。 我们提出了超越的拒绝框架,这是一个基于超越的拒绝框架,这是一种基于共形预测理论的最近提出的策略。特别是,我们提供了超越的正式处理,使我们能够完善保质评估理论(其基本的统计引擎),并更好地理解其有效性的理论原因。在此过程中,我们开发了两个匹配或超过原始性能的另外的共形评估器,同时显着降低了计算开销。我们对跨越5年的恶意软件数据集进行了评估,该数据集删除了原始评估中存在的实验偏差来源。超越的表现优于最先进的方法,同时跨越不同的恶意软件域和分类器。 为了进一步协助从业者,我们确定了超越部署的最佳操作设置,并显示如何将其应用于许多流行的学习算法。这些见解支持旧的和新的经验发现,这使得超越了声音和实用的解决方案。为此,我们发布了超然的开源,以帮助安全界采用拒绝策略。
Machine learning for malware classification shows encouraging results, but real deployments suffer from performance degradation as malware authors adapt their techniques to evade detection. This phenomenon, known as concept drift, occurs as new malware examples evolve and become less and less like the original training examples. One promising method to cope with concept drift is classification with rejection in which examples that are likely to be misclassified are instead quarantined until they can be expertly analyzed. We propose TRANSCENDENT, a rejection framework built on Transcend, a recently proposed strategy based on conformal prediction theory. In particular, we provide a formal treatment of Transcend, enabling us to refine conformal evaluation theory -- its underlying statistical engine -- and gain a better understanding of the theoretical reasons for its effectiveness. In the process, we develop two additional conformal evaluators that match or surpass the performance of the original while significantly decreasing the computational overhead. We evaluate TRANSCENDENT on a malware dataset spanning 5 years that removes sources of experimental bias present in the original evaluation. TRANSCENDENT outperforms state-of-the-art approaches while generalizing across different malware domains and classifiers. To further assist practitioners, we determine the optimal operational settings for a TRANSCENDENT deployment and show how it can be applied to many popular learning algorithms. These insights support both old and new empirical findings, making Transcend a sound and practical solution for the first time. To this end, we release TRANSCENDENT as open source, to aid the adoption of rejection strategies by the security community.