使用时域gans进行扬声器验证的联合域的适应和语音带宽扩展

论文标题

使用时域gans进行扬声器验证的联合域的适应和语音带宽扩展

Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

论文作者

Kataria, Saurabh, Villalba, Jesús, Moro-Velázquez, Laureano, Dehak, Najim

论文摘要

为特定选择的声学域和采样频率而开发的语音系统不容易转化为其他人。通常的做法是独立学习域的适应和带宽扩展模型。与此相反，我们建议一起学习这两个任务。特别是，我们学会将窄带对话的语音映射到宽带麦克风的演讲。我们开发了并行和非并行学习解决方案，这些解决方案同时利用配对和未配对的数据。首先，我们首先讨论了针对我们的任务的多个生成模型的联合和脱节培训。然后，我们提出了一个两阶段的学习解决方案，在该解决方案中，我们使用预训练的域适应系统来进行带宽扩展训练。我们在下游任务上评估了我们的计划。我们使用了NIST SRE21的JHU-MIT实验设置，该设置包括SRE16，SRE-CTS超集和SRE21。我们的结果提供了第一个证据，即学习这两个任务均比学习一项更好。在SRE16上，我们的最佳系统在同等错误率W.R.T.中实现了22％的相对改善。直接学习基线和8％W.R.T.强大的带宽扩展系统。

Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrowband conversational telephone speech to wideband microphone speech. We developed parallel and non-parallel learning solutions which utilize both paired and unpaired data. First, we first discuss joint and disjoint training of multiple generative models for our tasks. Then, we propose a two-stage learning solution where we use a pre-trained domain adaptation system for pre-processing in bandwidth extension training. We evaluated our schemes on a Speaker Verification downstream task. We used the JHU-MIT experimental setup for NIST SRE21, which comprises SRE16, SRE-CTS Superset and SRE21. Our results provide the first evidence that learning both tasks is better than learning just one. On SRE16, our best system achieves 22% relative improvement in Equal Error Rate w.r.t. a direct learning baseline and 8% w.r.t. a strong bandwidth expansion system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题