针对软件故障预测的外部有效性的匪徒算法的模拟研究

论文标题

针对软件故障预测的外部有效性的匪徒算法的模拟研究

A Simulation Study of Bandit Algorithms to Address External Validity of Software Fault Prediction

论文作者

Hayakawa, Teruki, Tsunoda, Masateru, Toda, Koji, Nakasai, Keitaro, Matsumoto, Kenichi

论文摘要

已经提出了各种软件故障预测模型和构建算法的技术。许多研究已经比较并评估了它们以识别最有效的研究。但是，在大多数情况下，此类模型和技术在每个数据集上都没有最佳性能。这是因为软件开发数据集的多样性，因此，所选模型或技术在某个数据集上显示出不良的性能。为了避免选择低精度模型，我们应用匪徒算法来预测故障。考虑一个情况，玩家有100个硬币可以在几台老虎机上下注。软件故障预测的普通用法类似于玩家在一台老虎机中投注所有100个硬币。相比之下，Bandit算法在每台机器上（即使用预测模型）逐步寻找一枚硬币来寻求最佳机器。在实验中，我们开发了一个人工数据集，其中包括100个模块，其中15个包括故障。然后，我们开发了各种人工断层预测模型，并使用匪徒算法动态选择它们。与仅使用一个预测模型相比，汤姆森采样算法显示出最佳或第二好的预测性能。

Various software fault prediction models and techniques for building algorithms have been proposed. Many studies have compared and evaluated them to identify the most effective ones. However, in most cases, such models and techniques do not have the best performance on every dataset. This is because there is diversity of software development datasets, and therefore, there is a risk that the selected model or technique shows bad performance on a certain dataset. To avoid selecting a low accuracy model, we apply bandit algorithms to predict faults. Consider a case where player has 100 coins to bet on several slot machines. Ordinary usage of software fault prediction is analogous to the player betting all 100 coins in one slot machine. In contrast, bandit algorithms bet one coin on each machine (i.e., use prediction models) step-by-step to seek the best machine. In the experiment, we developed an artificial dataset that includes 100 modules, 15 of which include faults. Then, we developed various artificial fault prediction models and selected them dynamically using bandit algorithms. The Thomson sampling algorithm showed the best or second-best prediction performance compared with using only one prediction model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题