扮演角色：在电影中的扬声器验证

论文标题

扮演角色：在电影中的扬声器验证

Playing a Part: Speaker Verification at the Movies

论文作者

Brown, Andrew, Huh, Jaesung, Nagrani, Arsha, Chung, Joon Son, Zisserman, Andrew

论文摘要

这项工作的目的是调查电影中流行的演讲者识别模型在电影中的语音片段上的表现，在那里，演员经常有意掩饰自己的声音来扮演角色。我们做出以下三项贡献：（i）我们收集了一个名为VoxMovies的小说，具有挑战性的演讲者识别数据集，并提供了来自近4000个电影剪辑的856个身份的演讲。 VoxMovies包含各种情感，重音和背景噪音的话语，因此与当前的演讲者识别数据集（如Voxceleb）中的采访式，情感平静的话语构成了完全不同的领域；（ii）我们提供许多域的适应评估集，并基准在这些评估对中最先进的说话者识别模型的性能。我们证明说话者验证和识别性能在此新数据上都急剧下降，这表明了跨域转移模型的挑战。最后（iii）我们表明，简单的域适应范例可提高性能，但仍然有很大的改进空间。

The goal of this work is to investigate the performance of popular speaker recognition models on speech segments from movies, where often actors intentionally disguise their voice to play a character. We make the following three contributions: (i) We collect a novel, challenging speaker recognition dataset called VoxMovies, with speech for 856 identities from almost 4000 movie clips. VoxMovies contains utterances with varying emotion, accents and background noise, and therefore comprises an entirely different domain to the interview-style, emotionally calm utterances in current speaker recognition datasets such as VoxCeleb; (ii) We provide a number of domain adaptation evaluation sets, and benchmark the performance of state-of-the-art speaker recognition models on these evaluation pairs. We demonstrate that both speaker verification and identification performance drops steeply on this new data, showing the challenge in transferring models across domains; and finally (iii) We show that simple domain adaptation paradigms improve performance, but there is still large room for improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题