两个模型比一个模型更好：Google Gboard下一个单词预测不是私人的。

论文标题

两个模型比一个模型更好：Google Gboard下一个单词预测不是私人的。

Two Models are Better than One: Federated Learning Is Not Private For Google GBoard Next Word Prediction

论文作者

Suliman, Mohamed, Leith, Douglas

论文摘要

在本文中，我们提出了用于训练自然语言文本模型的新攻击。我们说明了针对Google Gboard应用中使用的下一个单词预测模型的攻击的有效性，Google的Gboard应用程序中使用了广泛使用的移动键盘应用程序，它是联邦学习用于生产使用的早期采用者。我们证明了用户在移动手机上键入的单词，例如在发送短信时，可以在各种条件下以高精度恢复，并且对使用迷你批次和添加本地噪声的反测量无效。我们还表明，可以以高保真度重建单词订单（因此可以键入的实际句子）。这引起了明显的隐私问题，特别是因为Gboard在生产中。

In this paper we present new attacks against federated learning when used to train natural language text models. We illustrate the effectiveness of the attacks against the next word prediction model used in Google's GBoard app, a widely used mobile keyboard app that has been an early adopter of federated learning for production use. We demonstrate that the words a user types on their mobile handset, e.g. when sending text messages, can be recovered with high accuracy under a wide range of conditions and that counter-measures such a use of mini-batches and adding local noise are ineffective. We also show that the word order (and so the actual sentences typed) can be reconstructed with high fidelity. This raises obvious privacy concerns, particularly since GBoard is in production use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题