论文标题

NeuralDPS:神经确定性加和随机模型,具有多播激励,可控制波形的波形

NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation

论文作者

Wang, Tao, Fu, Ruibo, Yi, Jiangyan, Tao, Jianhua, Wen, Zhengqi

论文摘要

传统的声码器具有高综合效率,强大的解释性和语音编辑性的优势,而神经声码器具有高综合质量的优势。为了结合受传统的确定性加上随机模型的启发的两个声码器的优势,本文提出了一个名为NeuralDps的新型神经声码器,可以保留高语音质量并获得高综合效率和噪声可控性。首先,该框架包含四个模块:确定性源模块,随机源模块,神经V/UV决策模块和神经滤波器模块。 Vocoder所需的输入只是光谱参数,该参数避免了估计其他参数(例如F0)引起的错误。其次,为了解决不同频带可能具有不同比例确定性组件和随机组件的问题,使用多播激发策略来产生更准确的激发信号并减轻神经过滤器的负担。第三,提出了一种控制语音噪声成分的方法。这样,可以轻松调整语音的信噪比(SNR)。客观和主观的实验结果表明,我们提出的神经dps vocoder可以在波烯内获得相似的性能,并且它的波形至少比vavenet vocoder快280倍。它也比单个CPU核心上的Wovegan的合成效率快28%。我们还通过实验验证了该方法可以有效地控制预测语音中的噪声组件并调整语音的SNR。可以在https://hairuo55.github.io/neuraldps上找到产生的语音的示例。

The traditional vocoders have the advantages of high synthesis efficiency, strong interpretability, and speech editability, while the neural vocoders have the advantage of high synthesis quality. To combine the advantages of two vocoders, inspired by the traditional deterministic plus stochastic model, this paper proposes a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability. Firstly, this framework contains four modules: a deterministic source module, a stochastic source module, a neural V/UV decision module and a neural filter module. The input required by the vocoder is just the spectral parameter, which avoids the error caused by estimating additional parameters, such as F0. Secondly, to solve the problem that different frequency bands may have different proportions of deterministic components and stochastic components, a multiband excitation strategy is used to generate a more accurate excitation signal and reduce the neural filter's burden. Thirdly, a method to control noise components of speech is proposed. In this way, the signal-to-noise ratio (SNR) of speech can be adjusted easily. Objective and subjective experimental results show that our proposed NeuralDPS vocoder can obtain similar performance with the WaveNet and it generates waveforms at least 280 times faster than the WaveNet vocoder. It is also 28% faster than WaveGAN's synthesis efficiency on a single CPU core. We have also verified through experiments that this method can effectively control the noise components in the predicted speech and adjust the SNR of speech. Examples of generated speech can be found at https://hairuo55.github.io/NeuralDPS.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源