论文标题

方式:噪声擦除的多视图注意网络

MANNER: Multi-view Attention Network for Noise Erasure

论文作者

Park, Hyun Joon, Kang, Byung Ha, Shin, Wooseok, Kim, Jin Sob, Han, Sung Won

论文摘要

在语音增强领域,时域方法在达到高性能和效率方面遇到困难。最近,已经采用了双路径模型来表示长顺序特征,但它们的表示形式和记忆效率差。在这项研究中,我们提出了用于噪声擦除(方式)的多视图注意网络,该网络由卷积编码器编码器组成,该卷积编码器具有多视图的注意块,该杂音应用于时间域信号。方式有效地从嘈杂的语音中提取了三种不同的表示,并估算了高质量的干净语音。我们根据五个客观的语音质量指标评估了语音库需求数据集的方式。实验结果表明,这种方式在有效地处理嘈杂的语音的同时,实现了最先进的表现。

In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源