EPG2S：使用多模式学习的基于电动图和音频信号的语音产生和语音增强

论文标题

EPG2S：使用多模式学习的基于电动图和音频信号的语音产生和语音增强

EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning

论文作者

Chen, Li-Chin, Chen, Po-Hsun, Tsai, Richard Tzong-Han, Tsao, Yu

论文摘要

当缺乏口头交流的范围时，例如，对于失去说话能力的患者，语言发展的言语产生和增强有助于沟通。尽管已经提出了各种技术，但电视学（EPG）是一种监测技术，记录了舌头和硬口感之间的接触，但尚未得到充分探索。在此，我们提出了一种新型的多模式EPG到语音（EPG2S）系统，该系统利用EPG和语音信号进行语音产生和增强。检查了基于EPG和嘈杂语音信号的多种组合的不同融合策略，并研究了该方法的可行性。实验结果表明，EPG2仅基于EPG信号实现了理想的语音产生结果。此外，观察到嘈杂的语音信号的添加以提高质量和清晰度。此外，观察到EPG2S仅基于音频信号实现高质量的语音增强，而添加EPG信号进一步提高了性能。晚期的融合策略被认为是同时言语产生和增强的最有效方法。

Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies based on multiple combinations of EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. The late fusion strategy is deemed to be the most effective approach for simultaneous speech generation and enhancement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题