WaveFit：基于固定点迭代的迭代和非自动回忆性神经声码编码器

论文标题

WaveFit：基于固定点迭代的迭代和非自动回忆性神经声码编码器

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

论文作者

Koizumi, Yuma, Yatabe, Kohei, Zen, Heiga, Bacchiani, Michiel

论文摘要

denoising扩散概率模型（DDPM）和生成对抗网络（GAN）是神经声码器的流行生成模型。 DDPM和GAN的特征分别以迭代性脱氧框架和对抗训练的特征。这项研究提出了一个称为\ textit {waveFit}的快速，高质量的神经声码器，该机器人将gan的本质集成到基于固定点迭代的基于ddpm的迭代框架中。 WaveFit迭代地将输入信号降低，并训练深神网络（DNN），以最大程度地减少从中间输出中计算出的对抗损失。主观（并排的）听力测试显示，人类自然语音与通过五个迭代的Wavefit合成的自然语音之间的自然性差异没有统计学上的显着差异。此外，WaveFit的推理速度比Wavernn快240倍。音频演示可在\ url {google.github.io/df-conformer/wavefit/}上获得。

Denoising diffusion probabilistic models (DDPMs) and generative adversarial networks (GANs) are popular generative models for neural vocoders. The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, respectively. This study proposes a fast and high-quality neural vocoder called \textit{WaveFit}, which integrates the essence of GANs into a DDPM-like iterative framework based on fixed-point iteration. WaveFit iteratively denoises an input signal, and trains a deep neural network (DNN) for minimizing an adversarial loss calculated from intermediate outputs at all iterations. Subjective (side-by-side) listening tests showed no statistically significant differences in naturalness between human natural speech and those synthesized by WaveFit with five iterations. Furthermore, the inference speed of WaveFit was more than 240 times faster than WaveRNN. Audio demos are available at \url{google.github.io/df-conformer/wavefit/}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题