更少的是：更快，更好的音乐版本识别并嵌入蒸馏

论文标题

更少的是：更快，更好的音乐版本识别并嵌入蒸馏

Less is more: Faster and better music version identification with embedding distillation

论文作者

Yesiler, Furkan, Serrà, Joan, Gómez, Emilia

论文摘要

版本识别系统旨在检测相同基础音乐作品的不同演绎（松散称为封面歌曲）。通过学习将整个记录编码为普通的矢量嵌入，最近的系统在弥合准确性和可扩展性之间的差距方面取得了重大进展，这在近二十年中一直是一个关键的挑战。在这项工作中，我们建议通过采用一组数据蒸馏技术来进一步缩小这一差距，从而降低预先训练的最新模型的嵌入维度。我们比较了广泛的技术，并提出了新技术，从经典维度降低到更复杂的蒸馏方案。有了这些，我们获得了99％较小的嵌入量，此外，精确度的准确度提高了3％。如此小的嵌入在检索时间内可能会产生重要的影响，直到在独立的笔记本电脑上实现现实世界系统。

Version identification systems aim to detect different renditions of the same underlying musical composition (loosely called cover songs). By learning to encode entire recordings into plain vector embeddings, recent systems have made significant progress in bridging the gap between accuracy and scalability, which has been a key challenge for nearly two decades. In this work, we propose to further narrow this gap by employing a set of data distillation techniques that reduce the embedding dimensionality of a pre-trained state-of-the-art model. We compare a wide range of techniques and propose new ones, from classical dimensionality reduction to more sophisticated distillation schemes. With those, we obtain 99% smaller embeddings that, moreover, yield up to a 3% accuracy increase. Such small embeddings can have an important impact in retrieval time, up to the point of making a real-world system practical on a standalone laptop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题