论文标题
分支机构:平行MLP注意体系结构,以捕获语音识别和理解的本地和全球环境
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding
论文作者
论文摘要
事实证明,构象异构体在许多语音处理任务中都是有效的。它结合了使用卷积和使用自我注意的全球依赖性提取本地依赖的好处。受此启发,我们提出了一个更灵活,可解释和可自定义的编码器替代方案,分支机构,并在端到端语音处理中对各种远程依赖关系进行建模。在每个编码层中,一个分支都采用自我注意事项或其变体来捕获长期依赖性,而另一个分支则使用带有卷积门控(CGMLP)的MLP模块来提取局部关系。我们对几种语音识别和口语理解基准进行实验。结果表明,我们的模型表现优于变形金刚和CGMLP。它还与构象异构体获得的最先进结果相匹配。此外,由于两个分支机构的体系结构,我们还展示了减少计算的各种策略,包括在单个训练有素的模型中具有可变的推理复杂性的能力。合并分支的权重表明了如何在不同层中使用本地和全球依赖性,从而使模型设计受益。
Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. In each encoder layer, one branch employs self-attention or its variant to capture long-range dependencies, while the other branch utilizes an MLP module with convolutional gating (cgMLP) to extract local relationships. We conduct experiments on several speech recognition and spoken language understanding benchmarks. Results show that our model outperforms both Transformer and cgMLP. It also matches with or outperforms state-of-the-art results achieved by Conformer. Furthermore, we show various strategies to reduce computation thanks to the two-branch architecture, including the ability to have variable inference complexity in a single trained model. The weights learned for merging branches indicate how local and global dependencies are utilized in different layers, which benefits model designing.