NIX-TTS：通过模块蒸馏轻巧和端到端文本到语音

论文标题

NIX-TTS：通过模块蒸馏轻巧和端到端文本到语音

Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation

论文作者

Chevi, Rendi, Prasojo, Radityo Eko, Aji, Alham Fikri, Tjandra, Andros, Sakti, Sakriani

论文摘要

轻质TT的几种解决方案已显示出令人鼓舞的结果。尽管如此，他们要么依赖于达到最小尺寸的手工设计的设计，要么使用神经建筑搜索，但经常遭受培训费用。我们提出了Nix-TTS，这是一种通过知识蒸馏到高质量但大型，非自动进取和端到端（不含Vocoder）TTS TTS教师模型的轻量级TT。具体而言，我们提供模块蒸馏，使编码器和解码器模块具有柔性和独立的蒸馏。由此产生的Nix-TT遗传了从教师端到端的非解放和端到端的优势性能，但规模较小，只有523万参数或最高为89.34％的教师模型；它还在Intel-I7 CPU和Raspberry Pi 3B上分别实现了3.04倍和8.36倍的推理速度，与教师模型相比，它仍然保持公平的自然性和清晰度。我们提供了Nix-TTS的预贴模型和音频样本。

Several solutions for lightweight TTS have shown promising results. Still, they either rely on a hand-crafted design that reaches non-optimum size or use a neural architecture search but often suffer training costs. We present Nix-TTS, a lightweight TTS achieved via knowledge distillation to a high-quality yet large-sized, non-autoregressive, and end-to-end (vocoder-free) TTS teacher model. Specifically, we offer module-wise distillation, enabling flexible and independent distillation to the encoder and decoder module. The resulting Nix-TTS inherited the advantageous properties of being non-autoregressive and end-to-end from the teacher, yet significantly smaller in size, with only 5.23M parameters or up to 89.34% reduction of the teacher model; it also achieves over 3.04x and 8.36x inference speedup on Intel-i7 CPU and Raspberry Pi 3B respectively and still retains a fair voice naturalness and intelligibility compared to the teacher model. We provide pretrained models and audio samples of Nix-TTS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题