论文标题

哪些拉请求被接受,为什么?对流行的NPM软件包的研究

Which Pull Requests Get Accepted and Why? A study of popular NPM Packages

论文作者

Dey, Tapajit, Mockus, Audris

论文摘要

背景:拉申请(PR)集成商通常会在多个并发的PR方面面临挑战,因此衡量哪些PR将被接受的能力可以帮助他们平衡其工作量。 PR创建者将受益于知道其PR的某些特征是否会增加接受的机会。目的:我们模拟了使用随机森林模型在创建后一个月内使用50个预测变量的一个预测因子,代表作者,PR和提交PR的项目的50个预测变量的概率。方法:分析了4218个流行的NPM软件包的483,988个PR,我们选择了14个预测因子的子集,足以使调谐的随机森林模型达到高精度。结果:可预测PR接受的AUC-ROC值为0.95。排除提交后更改的PR属性的模型的AUC-ROC值为0.89。我们通过使用历史数据为NPM软件包\ textit {bootstrap}训练模型的实用性,并预测将来提交的PRS是否会被接受。这给了我们所有14个预测变量的AUC-ROC值为0.94,而0.77不包括创建后变化的PR属性。结论:PR集成商可以使用我们的模型来高度准确地评估开放PRS和PR创建者的质量,可以通过了解其PR的哪些特征从集成商的角度不受欢迎,从而从模型中受益。该模型可以作为工具实现,我们计划将其作为未来的工作。

Background: Pull Request (PR) Integrators often face challenges in terms of multiple concurrent PRs, so the ability to gauge which of the PRs will get accepted can help them balance their workload. PR creators would benefit from knowing if certain characteristics of their PRs may increase the chances of acceptance. Aim: We modeled the probability that a PR will be accepted within a month after creation using a Random Forest model utilizing 50 predictors representing properties of the author, PR, and the project to which PR is submitted. Method: 483,988 PRs from 4218 popular NPM packages were analysed and we selected a subset of 14 predictors sufficient for a tuned Random Forest model to reach high accuracy. Result: An AUC-ROC value of 0.95 was achieved predicting PR acceptance. The model excluding PR properties that change after submission gave an AUC-ROC value of 0.89. We tested the utility of our model in practical scenarios by training it with historical data for the NPM package \textit{bootstrap} and predicting if the PRs submitted in future will be accepted. This gave us an AUC-ROC value of 0.94 with all 14 predictors, and 0.77 excluding PR properties that change after its creation. Conclusion: PR integrators can use our model for a highly accurate assessment of the quality of the open PRs and PR creators may benefit from the model by understanding which characteristics of their PRs may be undesirable from the integrators' perspective. The model can be implemented as a tool, which we plan to do as a future work.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源