论文标题

可扩展程序克隆通过光谱分析搜索

Scalable Program Clone Search Through Spectral Analysis

论文作者

Benoit, Tristan, Marion, Jean-Yves, Bardin, Sébastien

论文摘要

我们考虑程序克隆搜索的问题,即给定的目标程序和已知程序的存储库(均为可执行格式),目标是在与目标程序最相似的存储库中找到该程序 - 在反向工程,程序集群,恶意软件线索和软件盗窃检测方面具有潜在的应用程序。近年来,代码相似性技术的盛开,但大多数专注于功能级别的相似性和功能克隆搜索,而我们对程序级相似性和程序克隆搜索感兴趣。实际上,我们的研究表明,先前的相似性方法要么太慢而无法处理大型程序存储库,要么不够精确,或者在编译器,源代码版本或轻度混淆的略有变化方面却不适用。我们提出了一种用于程序级相似性和程序克隆搜索的新型光谱分析方法,称为程序频谱相似性(PSS)。简而言之,PSS的一次性光谱特征提取是针对大型存储库量身定制的,非常适合程序克隆搜索。我们已经将不同的方法与广泛的基准进行了比较,这表明PSS在精度,速度和稳健性方面达到了最佳位置。

We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to the target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity and function clone search, while we are interested in program-level similarity and program clone search. Actually, our study shows that prior similarity approaches are either too slow to handle large program repositories, or not precise enough, or yet not robust against slight variations introduced by compilers, source code versions or light obfuscations. We propose a novel spectral analysis method for program-level similarity and program clone search called Programs Spectral Similarity (PSS). In a nutshell, PSS one-time spectral feature extraction is tailored for large repositories, making it a perfect fit for program clone search. We have compared the different approaches with extensive benchmarks, showing that PSS reaches a sweet spot in terms of precision, speed and robustness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源