论文标题
具有非参数变分信息瓶颈的变压器的变量自动编码器
A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck
论文作者
论文摘要
我们通过为变压器嵌入的变异信息瓶颈常规剂开发变压器提出了VAE。我们将变压器编码器的嵌入空间形式化为混合概率分布,并使用贝叶斯非参数来推导非参数变化信息瓶颈(NVIB),以进行此类基于注意力的嵌入。由非参数方法支持的混合成分数量可变数量,可捕获注意力支持的向量数量,而我们的非参数分布的交换性捕获了注意力的置换不变性。这使得NVIB能够将矢量的数量正规化,并在各个向量中访问。通过将变压器编码器与NVIB进行正规注意,我们提出了一个非参数变化自动编码器(NVAE)。关于训练自然语言文本的NVAE的最初实验表明,诱导的嵌入空间具有VAE对于变压器的所需特性。
We propose a VAE for Transformers by developing a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components supported by nonparametric methods captures the variable number of vectors supported by attention, and the exchangeability of our nonparametric distributions captures the permutation invariance of attention. This allows NVIB to regularise the number of vectors accessible with attention, as well as the amount of information in individual vectors. By regularising the cross-attention of a Transformer encoder-decoder with NVIB, we propose a nonparametric variational autoencoder (NVAE). Initial experiments on training a NVAE on natural language text show that the induced embedding space has the desired properties of a VAE for Transformers.