论文标题

Ariel:句子生成的音量编码

AriEL: volume coding for sentence generation

论文作者

Celotti, Luca, Brodeur, Simon, Rouat, Jean

论文摘要

将离散数据的序列映射到连续空间中的一个点,因此很难通过随机采样来检索这些序列。将输入映射到卷将使在测试时更容易检索,这就是基于变性自动编码器的方法之类的策略。然而,他们同时优化预测和表述平稳性,迫使他们在两者之间进行权衡。我们改善了深度学习中某些标准方法的性能,从而通过均匀地采样连续的空间来产生句子。我们通过提出Ariel来做到这一点,该Ariel在连续的空间中构造体积,而无需鼓励通过损失函数创建体积。我们首先在玩具语法上进行基准测试,该语法允许自动评估模型所学和生成的语言。然后,我们基于人类对话的真实数据集进行基准测试。我们的结果表明,对存储信息的随机访问得到了极大的改进,我们的方法Ariel能够通过随机采样潜在的空间来生成更广泛的正确语言。 VAE遵循玩具数据集的性能,而实际数据集则遵循AE和Transformer。这部分支持以下假设:将信息编码为体积而不是分为点,可以通过随机抽样改善学习信息的检索。这可以导致更好的发电机,我们还讨论了潜在的缺点。

Mapping sequences of discrete data to a point in a continuous space makes it difficult to retrieve those sequences via random sampling. Mapping the input to a volume would make it easier to retrieve at test time, and that's the strategy followed by the family of approaches based on Variational Autoencoder. However the fact that they are at the same time optimizing for prediction and for smoothness of representation, forces them to trade-off between the two. We improve on the performance of some of the standard methods in deep learning to generate sentences by uniformly sampling a continuous space. We do it by proposing AriEL, that constructs volumes in a continuous space, without the need of encouraging the creation of volumes through the loss function. We first benchmark on a toy grammar, that allows to automatically evaluate the language learned and generated by the models. Then, we benchmark on a real dataset of human dialogues. Our results indicate that the random access to the stored information is dramatically improved, and our method AriEL is able to generate a wider variety of correct language by randomly sampling the latent space. VAE follows in performance for the toy dataset while, AE and Transformer follow for the real dataset. This partially supports to the hypothesis that encoding information into volumes instead of into points, can lead to improved retrieval of learned information with random sampling. This can lead to better generators and we also discuss potential disadvantages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源