论文标题
部分可观测时空混沌系统的无模型预测
TestAug: A Framework for Augmenting Capability-based NLP Tests
论文作者
论文摘要
最近提出的基于功能的NLP测试允许模型开发人员测试NLP模型的功能功能,从而揭示了传统持有机制无法检测到的功能故障。但是,现有的基于功能测试的工作需要大量的手动努力和域专业知识来创建测试案例。在本文中,我们通过利用GPT-3发动机来研究测试案例生成的低成本方法。我们进一步建议使用分类器从GPT-3中删除无效的输出,并将输出扩展到模板中以生成更多的测试用例。我们的实验表明,测试对行为测试的现有工作具有三个优点:(1)Testaug可以找到比现有工作更多的错误; (2)测试案中的测试案例更加多样化; (3)Testaug在很大程度上节省了制作测试套件的手动努力。可以在我们的项目网站(https://guanqun-yang.github.io/testaug/)和github(https://github.com/guanqun-yang/testaug)上找到Testaug的代码和数据。
The recently proposed capability-based NLP testing allows model developers to test the functional capabilities of NLP models, revealing functional failures that cannot be detected by the traditional heldout mechanism. However, existing work on capability-based testing requires extensive manual efforts and domain expertise in creating the test cases. In this paper, we investigate a low-cost approach for the test case generation by leveraging the GPT-3 engine. We further propose to use a classifier to remove the invalid outputs from GPT-3 and expand the outputs into templates to generate more test cases. Our experiments show that TestAug has three advantages over the existing work on behavioral testing: (1) TestAug can find more bugs than existing work; (2) The test cases in TestAug are more diverse; and (3) TestAug largely saves the manual efforts in creating the test suites. The code and data for TestAug can be found at our project website (https://guanqun-yang.github.io/testaug/) and GitHub (https://github.com/guanqun-yang/testaug).