论文标题
SMS骗局检测系统的经验分析
An Empirical Analysis of SMS Scam Detection Systems
论文作者
论文摘要
简短的消息服务(SMS)是在一代前向手机用户介绍的。他们组成了世界上最古老的大型网络,拥有数十亿个用户,因此吸引了很多欺诈。由于移动网络与Internet的融合,基于SMS的骗局也可能损害Internet服务的安全性。在这项研究中,我们提出了一个新的SMS骗局数据集,该数据集由153,551个SMS组成。我们将出于研究目的公开发布的数据集代表了最大的公共SMS骗局数据集。我们评估并比较了新数据集上几种已建立的机器学习方法所取得的性能,从浅机器学习方法到深神经网络到句法和语义特征模型。然后,我们通过评估其对对抗性操纵水平的鲁棒性来研究现有模型。这种观点在SMS垃圾邮件过滤中巩固了当前艺术的状态,突出了改善现有方法的局限性和机会。
The short message service (SMS) was introduced a generation ago to the mobile phone users. They make up the world's oldest large-scale network, with billions of users and therefore attracts a lot of fraud. Due to the convergence of mobile network with internet, SMS based scams can potentially compromise the security of internet services as well. In this study, we present a new SMS scam dataset consisting of 153,551 SMSes. This dataset that we will release publicly for research purposes represents the largest publicly-available SMS scam dataset. We evaluate and compare the performance achieved by several established machine learning methods on the new dataset, ranging from shallow machine learning approaches to deep neural networks to syntactic and semantic feature models. We then study the existing models from an adversarial viewpoint by assessing its robustness against different level of adversarial manipulation. This perspective consolidates the current state of the art in SMS Spam filtering, highlights the limitations and the opportunities to improve the existing approaches.