论文标题

黑匣子NLP模型的解释:调查

Interpretation of Black Box NLP Models: A Survey

论文作者

Choudhary, Shivani, Chatterjee, Niladri, Saha, Subir Kumar

论文摘要

越来越多的机器学习模型部署在金融和医疗保健等高股份的领域中。尽管表现出色,但许多模型都是本质上很难解释的黑匣子。研究人员正在越来越多的努力来开发解释这些黑盒模型的方法。基于扰动(例如石灰)的事后解释是在建立机器学习模型后广泛使用的方法。这类方法已被证明表现出巨大的不稳定性,对方法本身的有效性和损害用户信任构成了严重的挑战。在本文中,我们提出了S-lime,该S-lime利用基于中央限制定理的假设测试框架来确定保证所得解释稳定性所需的扰动点的数量。提供了模拟和现实世界数据集的实验,以证明我们方法的有效性。

An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源