论文标题
最小化语音命令识别中的顺序混乱错误
Minimizing Sequential Confusion Error in Speech Command Recognition
论文作者
论文摘要
语音命令识别(SCR)通常在资源约束设备上使用,以实现免提用户体验。但是,在实际应用中,由于在边缘设备上部署的小型模型的能力有限,这会极大地影响用户体验,因此通常会发生相似发音的命令之间的混乱。在本文中,受语音识别中歧视性培训的进步的启发,我们提出了一种新颖的最小化顺序混乱错误(MSCE)培训标准,尤其是SCR,旨在减轻命令混乱问题。具体而言,我们旨在提高根据MCE歧视标准将目标命令与其他命令区分开的能力。我们通过连接派时间分类(CTC)来定义不同命令的可能性。在培训期间,我们提出了几种策略来使用先验知识创建一个令人困惑的序列集,以用于相似的命令,而不是创建整个非目标命令集,从而可以更好地保存训练资源并有效地减少命令混乱错误。具体而言,我们设计并比较了三种不同的策略,以使设定结构混淆。通过使用我们提出的方法,我们可以在0.01误报率〜(far)和混淆错误时相对将错误的拒绝率〜(FRR)降低33.7%,而混淆错误则在我们的收集的语音命令集中减少18.28%。
Speech command recognition (SCR) has been commonly used on resource constrained devices to achieve hands-free user experience. However, in real applications, confusion among commands with similar pronunciations often happens due to the limited capacity of small models deployed on edge devices, which drastically affects the user experience. In this paper, inspired by the advances of discriminative training in speech recognition, we propose a novel minimize sequential confusion error (MSCE) training criterion particularly for SCR, aiming to alleviate the command confusion problem. Specifically, we aim to improve the ability of discriminating the target command from other commands on the basis of MCE discriminative criteria. We define the likelihood of different commands through connectionist temporal classification (CTC). During training, we propose several strategies to use prior knowledge creating a confusing sequence set for similar-sounding command instead of creating the whole non-target command set, which can better save the training resources and effectively reduce command confusion errors. Specifically, we design and compare three different strategies for confusing set construction. By using our proposed method, we can relatively reduce the False Reject Rate~(FRR) by 33.7% at 0.01 False Alarm Rate~(FAR) and confusion errors by 18.28% on our collected speech command set.