编译器反馈的可编译神经代码生成

论文标题

编译器反馈的可编译神经代码生成

Compilable Neural Code Generation with Compiler Feedback

论文作者

Wang, Xin, Wang, Yasheng, Wan, Yao, Mi, Fei, Li, Yitong, Zhou, Pingyi, Liu, Jin, Wu, Hao, Jiang, Xin, Liu, Qun

论文摘要

使用（或没有）自然语言描述的自动生成可编译程序一直是计算语言学和自动化软件工程的试金石问题。现有的深度学习方法将模型代码生成作为文本生成，要么受解码器中的语法结构约束，要么由大规模代码语料库（例如，Codegpt，Plbart和Codet5）上的预训练的语言模型驱动。但是，其中很少有人说明生成程序的汇编。为了提高生成程序的编译性，本文提出了CompCoder，CompCoder是一条三阶段的管道，利用编译器反馈进行编译代码生成，包括语言模型微调，可编程加强和可编译性歧视。对两项代码生成任务进行的全面实验证明了我们提出的方法的有效性，在与最终的代码库相比时，代码完成的汇编的成功率平均从44.18到89.18，分别从70.3到96.2。

Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering. Existing deep-learning approaches model code generation as text generation, either constrained by grammar structures in decoder, or driven by pre-trained language models on large-scale code corpus (e.g., CodeGPT, PLBART, and CodeT5). However, few of them account for compilability of the generated programs. To improve compilability of the generated programs, this paper proposes COMPCODER, a three-stage pipeline utilizing compiler feedback for compilable code generation, including language model fine-tuning, compilability reinforcement, and compilability discrimination. Comprehensive experiments on two code generation tasks demonstrate the effectiveness of our proposed approach, improving the success rate of compilation from 44.18 to 89.18 in code completion on average and from 70.3 to 96.2 in text-to-code generation, respectively, when comparing with the state-of-the-art CodeGPT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题