论文标题
具有同型加密的BERT嵌入中的隐私文本分类
Privacy-Preserving Text Classification on BERT Embeddings with Homomorphic Encryption
论文作者
论文摘要
将原始文本中的信息压缩到具有语义的低维矢量中的嵌入,已被广泛采用其功效。但是,最近的研究表明,嵌入可能会泄漏有关文本敏感属性的私人信息,在某些情况下可以倒入以恢复原始输入文本。为了应对这些日益增长的隐私挑战,我们提出了一种基于同形加密的嵌入私有化机制,以防止在文本分类过程中任何信息的潜在泄漏。特别是,我们的方法对来自最新模型(例如Bert)的嵌入的加密进行了文本分类,并在有效的GPU实施CKKS加密方案的支持下进行了支持。我们表明,我们的方法提供了对Bert嵌入的加密保护,同时在很大程度上保留了其在下游文本分类任务上的实用程序。
Embeddings, which compress information in raw text into semantics-preserving low-dimensional vectors, have been widely adopted for their efficacy. However, recent research has shown that embeddings can potentially leak private information about sensitive attributes of the text, and in some cases, can be inverted to recover the original input text. To address these growing privacy challenges, we propose a privatization mechanism for embeddings based on homomorphic encryption, to prevent potential leakage of any piece of information in the process of text classification. In particular, our method performs text classification on the encryption of embeddings from state-of-the-art models like BERT, supported by an efficient GPU implementation of CKKS encryption scheme. We show that our method offers encrypted protection of BERT embeddings, while largely preserving their utility on downstream text classification tasks.