基于PPG的唱歌语音转换与对抗表示学习

论文标题

基于PPG的唱歌语音转换与对抗表示学习

PPG-based singing voice conversion with adversarial representation learning

论文作者

Li, Zhonghao, Tang, Benlai, Yin, Xiang, Wan, Yuan, Xu, Ling, Shen, Chen, Ma, Zejun

论文摘要

Singing Voice Conversion（SVC）旨在将一位歌手的声音转换为其他歌手的声音，同时保持歌唱内容和旋律。除了最近的语音转换作品之外，我们还提出了一个新颖的模型，以稳步转换歌曲，同时保持其自然和语调。我们构建了端到端体系结构，将语音后验（PPG）作为输入和生成MEL频谱图。具体而言，我们实现了两个单独的编码：一个编码PPG作为内容，另一个编码MEL频谱图以提供声学和音乐信息。为了提高音色和旋律的性能，为模型设计了对抗性歌手混乱模块和Mel-Regressivers代表学习模块。客观和主观实验是对我们私人中国唱歌语料库进行的。与基准相比，我们的方法可以显着改善自然性，旋律和语音相似性的转化性能。此外，我们基于PPG的方法已被证明对嘈杂的来源是可靠的。

Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody. On top of recent voice conversion works, we propose a novel model to steadily convert songs while keeping their naturalness and intonation. We build an end-to-end architecture, taking phonetic posteriorgrams (PPGs) as inputs and generating mel spectrograms. Specifically, we implement two separate encoders: one encodes PPGs as content, and the other compresses mel spectrograms to supply acoustic and musical information. To improve the performance on timbre and melody, an adversarial singer confusion module and a mel-regressive representation learning module are designed for the model. Objective and subjective experiments are conducted on our private Chinese singing corpus. Comparing with the baselines, our methods can significantly improve the conversion performance in terms of naturalness, melody, and voice similarity. Moreover, our PPG-based method is proved to be robust for noisy sources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题