论文标题
通过平台提供商对网络网络钓鱼套件进行分类,以提早检测
Classification of Web Phishing Kits for early detection by platform providers
论文作者
论文摘要
网络钓鱼套件是黑暗面专家为犯罪派钓鱼者社区提供的工具,以促进恶意网站的建设。随着这些套件的发展,基于Web的服务的提供商需要保持持续的复杂性。我们根据其采用的逃避和混淆功能,对2000多个最近的网络钓鱼套件的语料库进行了原始分类。我们对套件的源代码进行了初步的确定性分析,以提取有关其主要作者的最判别功能和信息。然后,我们通过监督的机器学习模型整合了此初始分类。得益于第一步中实现的基础真相,我们可以证明以及哪种机器学习模型是否能够适当地分类,即使在训练阶段采用新颖的逃避和混淆技术的套件也可以对这些套件进行分类。我们比较了不同的算法并在现实情况下评估它们的鲁棒性,在现实情况下,只有少数网络钓鱼套件可供培训。本文代表了支持Web服务提供商和分析师的初始但重要步骤,以改善可能安装在其平台上的网络钓鱼套件的早期检测机制和智能操作。
Phishing kits are tools that dark side experts provide to the community of criminal phishers to facilitate the construction of malicious Web sites. As these kits evolve in sophistication, providers of Web-based services need to keep pace with continuous complexity. We present an original classification of a corpus of over 2000 recent phishing kits according to their adopted evasion and obfuscation functions. We carry out an initial deterministic analysis of the source code of the kits to extract the most discriminant features and information about their principal authors. We then integrate this initial classification through supervised machine learning models. Thanks to the ground-truth achieved in the first step, we can demonstrate whether and which machine learning models are able to suitably classify even the kits adopting novel evasion and obfuscation techniques that were unseen during the training phase. We compare different algorithms and evaluate their robustness in the realistic case in which only a small number of phishing kits are available for training. This paper represents an initial but important step to support Web service providers and analysts in improving early detection mechanisms and intelligence operations for the phishing kits that might be installed on their platforms.