对抗性排列不变的通用声音分离训练

论文标题

对抗性排列不变的通用声音分离训练

Adversarial Permutation Invariant Training for Universal Sound Separation

论文作者

Postolache, Emilian, Pons, Jordi, Pascual, Santiago, Serrà, Joan

论文摘要

通用声音分离包括用不同类型的任意声音分离混合物，而置换不变训练（PIT）用于训练这样做的源源不可知论模型。在这项工作中，我们与对抗性损失相辅相成，但发现它在语音源分离中使用的标准配方具有挑战性。我们通过一种基于I-Replacement上下文的对抗性损失以及用多个歧视者进行培训来克服这一挑战。我们的实验表明，通过简单地改善损失（保持相同的模型和数据集），我们可以在混响大惊小怪的数据集中获得不可忽略的改进1.4 db si-snri。我们还发现，对抗性坑可有效减少光谱孔，这在基于面具的分离模型中无处不在，这突出了对抗性损失对源分离的潜在相关性。

Universal sound separation consists of separating mixes with arbitrary sounds of different types, and permutation invariant training (PIT) is used to train source agnostic models that do so. In this work, we complement PIT with adversarial losses but find it challenging with the standard formulation used in speech source separation. We overcome this challenge with a novel I-replacement context-based adversarial loss, and by training with multiple discriminators. Our experiments show that by simply improving the loss (keeping the same model and dataset) we obtain a non-negligible improvement of 1.4 dB SI-SNRi in the reverberant FUSS dataset. We also find adversarial PIT to be effective at reducing spectral holes, ubiquitous in mask-based separation models, which highlights the potential relevance of adversarial losses for source separation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题