方式：噪声擦除的多视图注意网络

论文标题

方式：噪声擦除的多视图注意网络

MANNER: Multi-view Attention Network for Noise Erasure

论文作者

Park, Hyun Joon, Kang, Byung Ha, Shin, Wooseok, Kim, Jin Sob, Han, Sung Won

论文摘要

在语音增强领域，时域方法在达到高性能和效率方面遇到困难。最近，已经采用了双路径模型来表示长顺序特征，但它们的表示形式和记忆效率差。在这项研究中，我们提出了用于噪声擦除（方式）的多视图注意网络，该网络由卷积编码器编码器组成，该卷积编码器具有多视图的注意块，该杂音应用于时间域信号。方式有效地从嘈杂的语音中提取了三种不同的表示，并估算了高质量的干净语音。我们根据五个客观的语音质量指标评估了语音库需求数据集的方式。实验结果表明，这种方式在有效地处理嘈杂的语音的同时，实现了最先进的表现。

In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题