会议中连续扬声器身份证明的内存增强体系结构

论文标题

会议中连续扬声器身份证明的内存增强体系结构

A Memory Augmented Architecture for Continuous Speaker Identification in Meetings

论文作者

Flemotomos, Nikolaos, Dimitriadis, Dimitrios

论文摘要

我们在多方记录的会议上介绍并分析了一种新颖的方法来解决说话者识别问题。鉴于语音段和一组可用的候选概况，我们提出了一种新颖的数据驱动方式来模拟它们之间的距离关系，旨在识别与该段相对应的说话者标签。为了实现这一目标，我们采用了一种经常性的基于内存的体系结构，因为已经证明这类神经网络可以在需要关系推理的问题中产生高级绩效。提出的距离关系的编码显示出胜过传统距离指标，例如余弦距离。当音频信号的时间连续性和扬声器更改的建模时。

We introduce and analyze a novel approach to the problem of speaker identification in multi-party recorded meetings. Given a speech segment and a set of available candidate profiles, we propose a novel data-driven way to model the distance relations between them, aiming at identifying the speaker label corresponding to that segment. To achieve this we employ a recurrent, memory-based architecture, since this class of neural networks has been shown to yield advanced performance in problems requiring relational reasoning. The proposed encoding of distance relations is shown to outperform traditional distance metrics, such as the cosine distance. Additional improvements are reported when the temporal continuity of the audio signals and the speaker changes is modeled in. In this paper, we have evaluated our method in two different tasks, i.e. scripted and real-world business meeting scenarios, where we report a relative reduction in speaker error rate of 39.28% and 51.84%, respectively, compared to the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题