宏观障碍，以改善训练端到端语音识别模型的正则化

论文标题

宏观障碍，以改善训练端到端语音识别模型的正则化

Macro-block dropout for improved regularization in training end-to-end speech recognition models

论文作者

Kim, Chanwoo, Indurti, Sathish, Park, Jinhwan, Sung, Wonyong

论文摘要

本文提出了一种新的正则化算法，称为宏块脱落。在培训大型神经网络模型中，过度拟合问题是一个困难的问题。事实证明，辍学技术通过防止训练过程中的复杂共同适应来简单而对正则化非常有效。在我们的工作中，我们定义了一个宏块，其中包含从输入到复发性神经网络（RNN）的大量单元。我们没有将辍学量应用于每个单元，而是将随机辍学量应用于每个宏块。即使我们保持恒定的平均辍学率，该算法的效果是对每层施加不同的掉落率，这具有更好的正则化效果。在我们使用复发性神经网络传播器（RNN-T）的实验中，该算法比在Librispeech测试清洁和测试中的常规辍学率相对4.30％和6.13％的单词错误率（WERS）提高。使用基于注意力的编码器模型（AED）模型，该算法在相同的测试集上的常规辍学率相对4.36％和5.85％。

This paper proposes a new regularization algorithm referred to as macro-block dropout. The overfitting issue has been a difficult problem in training large neural network models. The dropout technique has proven to be simple yet very effective for regularization by preventing complex co-adaptations during training. In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN). Rather than applying dropout to each unit, we apply random dropout to each macro-block. This algorithm has the effect of applying different drop out rates for each layer even if we keep a constant average dropout rate, which has better regularization effects. In our experiments using Recurrent Neural Network-Transducer (RNN-T), this algorithm shows relatively 4.30 % and 6.13 % Word Error Rates (WERs) improvement over the conventional dropout on LibriSpeech test-clean and test-other. With an Attention-based Encoder-Decoder (AED) model, this algorithm shows relatively 4.36 % and 5.85 % WERs improvement over the conventional dropout on the same test sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题