通过单渠道时域增强网络改善噪声强大的自动语音识别

论文标题

通过单渠道时域增强网络改善噪声强大的自动语音识别

Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

论文作者

Kinoshita, Keisuke, Ochiai, Tsubasa, Delcroix, Marc, Nakatani, Tomohiro

论文摘要

随着深度学习的出现，对噪声自动语音识别（ASR）的研究迅速发展。但是，在单通道系统的嘈杂条件下的ASR性能仍然不令人满意。的确，大多数单渠道语音增强（SE）方法（Denoising）仅带来了对经过多条件培训数据培训的最先进的ASR后端的性能有限。最近，关于在时间域中起作用的基于神经网络的SE方法的研究，显示出以前从未达到的性能水平。但是，尚未确定是否可以将这种时间域方法实现的高增强性能转化为ASR。在本文中，我们表明，单渠道时域denoising方法可以显着提高ASR性能，在Chime-4数据集的单渠道轨道的真实评估数据上，在强大的ASR后端提供了超过30％的相对单词误差。这些积极的结果表明，单渠道降噪仍然可以提高ASR性能，这应该为朝该方向进行更多研究打开大门。

With the advent of deep learning, research on noise-robust automatic speech recognition (ASR) has progressed rapidly. However, ASR performance in noisy conditions of single-channel systems remains unsatisfactory. Indeed, most single-channel speech enhancement (SE) methods (denoising) have brought only limited performance gains over state-of-the-art ASR back-end trained on multi-condition training data. Recently, there has been much research on neural network-based SE methods working in the time-domain showing levels of performance never attained before. However, it has not been established whether the high enhancement performance achieved by such time-domain approaches could be translated into ASR. In this paper, we show that a single-channel time-domain denoising approach can significantly improve ASR performance, providing more than 30 % relative word error reduction over a strong ASR back-end on the real evaluation data of the single-channel track of the CHiME-4 dataset. These positive results demonstrate that single-channel noise reduction can still improve ASR performance, which should open the door to more research in that direction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题