解释我的惊喜：通过预测不确定的结果来学习有效的长期记忆

论文标题

解释我的惊喜：通过预测不确定的结果来学习有效的长期记忆

Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

论文作者

Sorokin, Artyom, Buzun, Nazar, Pugachev, Leonid, Burtsev, Mikhail

论文摘要

在许多顺序任务中，模型需要记住遥远过去的相关事件，以做出正确的预测。不幸的是，基于梯度的训练的直接应用需要为序列的每个元素存储中间计算。如果一个序列由数千甚至数百万个元素组成，则需要存储过于大的中间数据，因此，学习了非常长期的依赖性。但是，通常仅考虑到时间上的局部信息，通常可以预测大多数序列元素。另一方面，仅在局部信息的情况下，受长期依赖性影响的预测稀疏，其特征是高不确定性。我们提出了一种新的培训方法，该方法允许一次学习长期依赖性，而无需一次通过整个序列进行反向传播梯度。该方法可能会应用于任何经常性架构。经过磁性训练的LSTM网络的性能更好或可与基准相媲美，同时需要存储较少的中间数据。

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can potentially be applied to any recurrent architecture. LSTM network trained with MemUP performs better or comparable to baselines while requiring to store less intermediate data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题