论文标题
方式:噪声擦除的多视图注意网络
MANNER: Multi-view Attention Network for Noise Erasure
论文作者
论文摘要
在语音增强领域,时域方法在达到高性能和效率方面遇到困难。最近,已经采用了双路径模型来表示长顺序特征,但它们的表示形式和记忆效率差。在这项研究中,我们提出了用于噪声擦除(方式)的多视图注意网络,该网络由卷积编码器编码器组成,该卷积编码器具有多视图的注意块,该杂音应用于时间域信号。方式有效地从嘈杂的语音中提取了三种不同的表示,并估算了高质量的干净语音。我们根据五个客观的语音质量指标评估了语音库需求数据集的方式。实验结果表明,这种方式在有效地处理嘈杂的语音的同时,实现了最先进的表现。
In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.