论文标题
Texthide:在语言理解任务中解决数据隐私
TextHide: Tackling Data Privacy in Language Understanding Tasks
论文作者
论文摘要
在分布式或联合学习中,尚未解决的挑战是有效地减轻隐私风险,而不会减慢训练或降低准确性。在本文中,我们提出了旨在应对自然语言理解任务的挑战。它要求所有参与者添加一个简单的加密步骤,以防止窃听攻击者恢复私人文本数据。这样的加密步骤是有效的,只会稍微影响任务性能。此外,Texthide非常适合任何句子或句子对任务的精细调整预训练语言模型(例如BERT)的流行框架。我们在胶水基准上评估Texthide,我们的实验表明,Texthide可以有效地捍卫对共享梯度或表示形式的攻击,并且平均准确性降低仅为$ 1.9 \%\%$。我们还使用有关数学问题的计算棘手性的猜想对Texthide的安全性进行了分析。 我们的代码可从https://github.com/hazelsuko07/texthide获得
An unsolved challenge in distributed or federated learning is to effectively mitigate privacy risks without slowing down training or reducing accuracy. In this paper, we propose TextHide aiming at addressing this challenge for natural language understanding tasks. It requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data. Such an encryption step is efficient and only affects the task performance slightly. In addition, TextHide fits well with the popular framework of fine-tuning pre-trained language models (e.g., BERT) for any sentence or sentence-pair task. We evaluate TextHide on the GLUE benchmark, and our experiments show that TextHide can effectively defend attacks on shared gradients or representations and the averaged accuracy reduction is only $1.9\%$. We also present an analysis of the security of TextHide using a conjecture about the computational intractability of a mathematical problem. Our code is available at https://github.com/Hazelsuko07/TextHide