论文标题

通过单渠道时域增强网络改善噪声强大的自动语音识别

Improving noise robust automatic speech recognition with single-channel time-domain enhancement network

论文作者

Kinoshita, Keisuke, Ochiai, Tsubasa, Delcroix, Marc, Nakatani, Tomohiro

论文摘要

随着深度学习的出现,对噪声自动语音识别(ASR)的研究迅速发展。但是,在单通道系统的嘈杂条件下的ASR性能仍然不令人满意。的确,大多数单渠道语音增强(SE)方法(Denoising)仅带来了对经过多条件培训数据培训的最先进的ASR后端的性能有限。最近,关于在时间域中起作用的基于神经网络的SE方法的研究,显示出以前从未达到的性能水平。但是,尚未确定是否可以将这种时间域方法实现的高增强性能转化为ASR。在本文中,我们表明,单渠道时域denoising方法可以显着提高ASR性能,在Chime-4数据集的单渠道轨道的真实评估数据上,在强大的ASR后端提供了超过30%的相对单词误差。这些积极的结果表明,单渠道降噪仍然可以提高ASR性能,这应该为朝该方向进行更多研究打开大门。

With the advent of deep learning, research on noise-robust automatic speech recognition (ASR) has progressed rapidly. However, ASR performance in noisy conditions of single-channel systems remains unsatisfactory. Indeed, most single-channel speech enhancement (SE) methods (denoising) have brought only limited performance gains over state-of-the-art ASR back-end trained on multi-condition training data. Recently, there has been much research on neural network-based SE methods working in the time-domain showing levels of performance never attained before. However, it has not been established whether the high enhancement performance achieved by such time-domain approaches could be translated into ASR. In this paper, we show that a single-channel time-domain denoising approach can significantly improve ASR performance, providing more than 30 % relative word error reduction over a strong ASR back-end on the real evaluation data of the single-channel track of the CHiME-4 dataset. These positive results demonstrate that single-channel noise reduction can still improve ASR performance, which should open the door to more research in that direction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源