Voxceleb扬声器诊断挑战的华为扬声器腹泻系统

论文标题

Voxceleb扬声器诊断挑战的华为扬声器腹泻系统

The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge

论文作者

Wang, Renyu, Tong, Ruilin, Yeung, Yu Ting, Chen, Xiao

论文摘要

本文介绍了我们提交给Voxceleb扬声器识别挑战赛的说话者诊断轨道（轨道4）的系统设置。2020年。我们的腹泻系统由训练有素的基于训练的神经网络的语音增强模型作为输入语音信号的前端进行预处理。我们用基于神经网络的VAD代替了常规的基于能量的语音活动检测（VAD）。基于神经网络的VAD提供了仅包含背景音乐，噪声和其他干扰的语音段的更准确的注释，这对于腹泻性能至关重要。我们将基于X-Vector的X-Vectors和分流贝叶斯隐藏的Markov模型（VB-HMM）的迭代术语聚类应用集聚分层聚类（AHC），以用于扬声器聚类。实验结果表明，我们提出的系统比基线系统实现了实质性改进，在评估集中，腹泻错误率（DER）为10.45％，雅卡德错误率（JER）为22.46％。

This paper describes system setup of our submission to speaker diarisation track (Track 4) of VoxCeleb Speaker Recognition Challenge 2020. Our diarisation system consists of a well-trained neural network based speech enhancement model as pre-processing front-end of input speech signals. We replace conventional energy-based voice activity detection (VAD) with a neural network based VAD. The neural network based VAD provides more accurate annotation of speech segments containing only background music, noise, and other interference, which is crucial to diarisation performance. We apply agglomerative hierarchical clustering (AHC) of x-vectors and variational Bayesian hidden Markov model (VB-HMM) based iterative clustering for speaker clustering. Experimental results demonstrate that our proposed system achieves substantial improvements over the baseline system, yielding diarisation error rate (DER) of 10.45%, and Jacard error rate (JER) of 22.46% on the evaluation set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题