端到端说话者验证的度量学习损失功能的比较

论文标题

端到端说话者验证的度量学习损失功能的比较

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

论文作者

Coria, Juan M., Bredin, Hervé, Ghannay, Sahar, Rosset, Sophie

论文摘要

尽管公制学习方法的普及越来越普及，但很少有工作试图对这些技术进行公平比较，以进行演讲者验证。我们试图填补这一空白，并在Voxceleb数据集上以系统的方式比较几个度量学习损失功能。第一个损失功能家族源于跨熵损失（通常用于监督分类），包括舒适的余弦损失，添加性角缘损失和中心损失。第二个损失功能的重点是训练样本之间的相似性，包括对比度损失和三胞胎损失。我们表明，添加性角度损耗函数在研究中优于所有其他损失函数，同时学习更强大的表示。基于SINCNET可训练的功能和X-vector架构的结合，本文使用的网络使我们更接近真正端到的扬声器验证系统，同时与添加性角度的损失相结合，同时仍然与X-Vector基线竞争。本着可复制的研究精神，我们还发布了开源Python代码，用于再现我们的结果，并在Torch.hub上共享预处理的Pytorch模型，该模型可以直接或微调后使用。

Despite the growing popularity of metric learning approaches, very little work has attempted to perform a fair comparison of these techniques for speaker verification. We try to fill this gap and compare several metric learning loss functions in a systematic manner on the VoxCeleb dataset. The first family of loss functions is derived from the cross entropy loss (usually used for supervised classification) and includes the congenerous cosine loss, the additive angular margin loss, and the center loss. The second family of loss functions focuses on the similarity between training samples and includes the contrastive loss and the triplet loss. We show that the additive angular margin loss function outperforms all other loss functions in the study, while learning more robust representations. Based on a combination of SincNet trainable features and the x-vector architecture, the network used in this paper brings us a step closer to a really-end-to-end speaker verification system, when combined with the additive angular margin loss, while still being competitive with the x-vector baseline. In the spirit of reproducible research, we also release open source Python code for reproducing our results, and share pretrained PyTorch models on torch.hub that can be used either directly or after fine-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题