使用Convnet扩展在线语音识别

论文标题

使用Convnet扩展在线语音识别

Scaling Up Online Speech Recognition Using ConvNets

论文作者

Pratap, Vineel, Xu, Qiantong, Kahn, Jacob, Avidov, Gilad, Likhomanenko, Tatiana, Hannun, Awni, Liptchinsky, Vitaliy, Synnaeve, Gabriel, Collobert, Ronan

论文摘要

我们设计了一个基于时间深度可分离（TDS）卷积和连接派时间分类（CTC）的在线端到端语音识别系统。我们改善了核心TDS体系结构，以限制未来的上下文，从而在保持准确性的同时降低潜伏期。该系统的吞吐量几乎是调谐的混合动力ASR基线的三倍，同时也具有较低的延迟和更好的单词错误率。我们高度优化的光束搜索解码器也对识别器的效率也很重要。为了显示我们的设计选择的影响，我们分析了吞吐量，延迟，准确性，并讨论如何根据用户要求调整这些指标。

We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). We improve the core TDS architecture in order to limit the future context and hence reduce latency while maintaining accuracy. The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency and a better word error rate. Also important to the efficiency of the recognizer is our highly optimized beam search decoder. To show the impact of our design choices, we analyze throughput, latency, accuracy, and discuss how these metrics can be tuned based on the user requirements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题