使用共享解码器的基于注意力的语音识别的中级输出正则化

论文标题

使用共享解码器的基于注意力的语音识别的中级输出正则化

Intermediate-layer output Regularization for Attention-based Speech Recognition with Shared Decoder

论文作者

Zhang, Jicheng, Peng, Yizhou, Xu, Haihua, He, Yi, Chng, Eng Siong, Huang, Hao

论文摘要

通过在编码器侧进行多任务训练的中间层输出（ILO）正则化已被证明是在广泛的端到端ASR框架上产生改进结果的有效方法。在本文中，我们提出了一种新颖的方法，以不同的方式进行ILO正规化培训。我们直接将中间层输出作为解码器的输入，即我们的解码器不仅接受最终编码器层的输出作为输入，还将中间层输出作为输入，而是将中间层输出作为输入，而是将中间层输出作为输入，而是将编码器ILO的输出作为输入。使用所提出的方法，由于编码器和解码器都被“正则化”，因此对网络进行了足够的训练，始终如一地与基于ILO的CTC方法以及基于原始注意力的建模方法相比，无需采用拟议的方法。

Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks. In this paper, we propose a novel method to do ILO regularized training differently. Instead of using conventional multitask methods that entail more training overhead, we directly make the intermediate layer output as input to the decoder, that is, our decoder not only accepts the output of the final encoder layer as input, it also takes the output of the encoder ILO as input during training. With the proposed method, as both encoder and decoder are simultaneously "regularized", the network is more sufficiently trained, consistently leading to improved results, over the ILO-based CTC method, as well as over the original attention-based modeling method without the proposed method employed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题