论文标题
解释我的惊喜:通过预测不确定的结果来学习有效的长期记忆
Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes
论文作者
论文摘要
在许多顺序任务中,模型需要记住遥远过去的相关事件,以做出正确的预测。不幸的是,基于梯度的训练的直接应用需要为序列的每个元素存储中间计算。如果一个序列由数千甚至数百万个元素组成,则需要存储过于大的中间数据,因此,学习了非常长期的依赖性。但是,通常仅考虑到时间上的局部信息,通常可以预测大多数序列元素。另一方面,仅在局部信息的情况下,受长期依赖性影响的预测稀疏,其特征是高不确定性。我们提出了一种新的培训方法,该方法允许一次学习长期依赖性,而无需一次通过整个序列进行反向传播梯度。该方法可能会应用于任何经常性架构。经过磁性训练的LSTM网络的性能更好或可与基准相媲美,同时需要存储较少的中间数据。
In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can potentially be applied to any recurrent architecture. LSTM network trained with MemUP performs better or comparable to baselines while requiring to store less intermediate data.