论文标题

通过代码转换仔细研究基于变压器的代码智能:挑战和机遇

A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities

论文作者

Li, Yaoxian, Qi, Shiyi, Gao, Cuiyun, Peng, Yun, Lo, David, Xu, Zenglin, Lyu, Michael R.

论文摘要

基于变压器的模型已在许多智能编码任务(例如代码注释生成和代码完成)中证明了最先进的性能。先前的研究表明,深度学习模型对输入变化很敏感,但是很少有研究系统地研究了在扰动输入代码下变压器的鲁棒性。在这项工作中,我们从经验上研究了传统守则代码转换对变压器性能的影响。具体而言,分别针对两种流行的编程语言(Java和Python)实施了24和27代码转换策略。为了促进分析,将策略分为五个类别:块转换,插入/缺失转换,语法语句转换,语法令牌转换和标识符变换。在三个流行的代码智能任务上进行的实验,包括代码完成,代码摘要和代码搜索,演示了插入/删除转换和标识符变换对变压器性能的最大影响。我们的结果还表明,基于抽象语法树(ASTS)的变压器比仅基于大多数代码转换下的代码序列的模型显示出更强的性能。此外,位置编码的设计可能会影响代码变换下变压器的鲁棒性。根据我们的发现,我们将基于变压器的代码智能的挑战和机会提炼出一些见解。

Transformer-based models have demonstrated state-of-the-art performance in many intelligent coding tasks such as code comment generation and code completion. Previous studies show that deep learning models are sensitive to the input variations, but few studies have systematically studied the robustness of Transformer under perturbed input code. In this work, we empirically study the effect of semantic-preserving code transformation on the performance of Transformer. Specifically, 24 and 27 code transformation strategies are implemented for two popular programming languages, Java and Python, respectively. For facilitating analysis, the strategies are grouped into five categories: block transformation, insertion/deletion transformation, grammatical statement transformation, grammatical token transformation, and identifier transformation. Experiments on three popular code intelligence tasks, including code completion, code summarization and code search, demonstrate insertion/deletion transformation and identifier transformation show the greatest impact on the performance of Transformer. Our results also suggest that Transformer based on abstract syntax trees (ASTs) shows more robust performance than the model based on only code sequence under most code transformations. Besides, the design of positional encoding can impact the robustness of Transformer under code transformation. Based on our findings, we distill some insights about the challenges and opportunities for Transformer-based code intelligence.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源