论文标题
NLP中的隐私粘附机不学习
Privacy Adhering Machine Un-learning in NLP
论文作者
论文摘要
一般数据保护法规(GDPR)在美国介绍的法规(GDPR)在美国介绍了\ textit {忘记权利}的规定,该规定要求将行业应用程序删除与个人有关的数据。在几个现实世界行业的应用程序中,使用机器学习来建立在用户数据上的模型,这种授权在数据清洁以及模型再培训方面都需要大量努力,同时确保模型由于删除数据而不会在预测质量上降低。结果,如果这些应用程序以非常高的频率收到此类请求,则连续删除数据和模型再培训步骤不会扩展。最近,一些研究人员提出了\ textit {Machine Unrearning}的想法,以应对这一挑战。尽管这项任务具有重要的重要性,但在自然语言处理(NLP)任务中,机器未学习的领域却没有探索。在本文中,我们探讨了各种胶水任务上的未学习框架\ cite {wang:18},例如qqp,sst和mnli。我们提出了计算有效的方法(SISA-FC和SISA-A),以执行\ textIt {保证}的学习,与在保持模型性能恒定的同时相比,与基准相比,与基准相比,记忆(90-95 \%),时间(100x)和太空消耗(99 \%)都可以显着降低。
Regulations introduced by General Data Protection Regulation (GDPR) in the EU or California Consumer Privacy Act (CCPA) in the US have included provisions on the \textit{right to be forgotten} that mandates industry applications to remove data related to an individual from their systems. In several real world industry applications that use Machine Learning to build models on user data, such mandates require significant effort both in terms of data cleansing as well as model retraining while ensuring the models do not deteriorate in prediction quality due to removal of data. As a result, continuous removal of data and model retraining steps do not scale if these applications receive such requests at a very high frequency. Recently, a few researchers proposed the idea of \textit{Machine Unlearning} to tackle this challenge. Despite the significant importance of this task, the area of Machine Unlearning is under-explored in Natural Language Processing (NLP) tasks. In this paper, we explore the Unlearning framework on various GLUE tasks \cite{Wang:18}, such as, QQP, SST and MNLI. We propose computationally efficient approaches (SISA-FC and SISA-A) to perform \textit{guaranteed} Unlearning that provides significant reduction in terms of both memory (90-95\%), time (100x) and space consumption (99\%) in comparison to the baselines while keeping model performance constant.