频率动态卷积：声音事件检测的频率自适应模式识别

论文标题

频率动态卷积：声音事件检测的频率自适应模式识别

Frequency Dynamic Convolution: Frequency-Adaptive Pattern Recognition for Sound Event Detection

论文作者

Nam, Hyeonuk, Kim, Seong-Hu, Ko, Byeong-Yun, Park, Yong-Hwa

论文摘要

2D卷积被广泛用于声音事件检测（SED），以识别声音事件的二维时频模式。但是，2D卷积会在时间和频率轴上对声音事件进行翻译等效，而频率不是换档尺寸。为了提高SED上2D卷积的身体一致性，我们提出了适用于适应输入频率成分的内核的频率动态卷积。频率动态卷积在多形声音检测评分（PSD）方面，在以下验证数据集中优于基线6.3％。它还明显优于SED上其他先前存在的内容自适应方法。另外，通过比较基线和频率动态卷积的类F1分数，我们表明频率动态卷积对于检测具有复杂时频模式的非平稳声音事件特别有效。通过此结果，我们验证了频率动态卷积在识别频率依赖性模式方面表现出色。

2D convolution is widely used in sound event detection (SED) to recognize two dimensional time-frequency patterns of sound events. However, 2D convolution enforces translation equivariance on sound events along both time and frequency axis while frequency is not shift-invariant dimension. In order to improve physical consistency of 2D convolution on SED, we propose frequency dynamic convolution which applies kernel that adapts to frequency components of input. Frequency dynamic convolution outperforms the baseline by 6.3% in DESED validation dataset in terms of polyphonic sound detection score (PSDS). It also significantly outperforms other pre-existing content-adaptive methods on SED. In addition, by comparing class-wise F1 scores of baseline and frequency dynamic convolution, we showed that frequency dynamic convolution is especially more effective for detection of non-stationary sound events with intricate time-frequency patterns. From this result, we verified that frequency dynamic convolution is superior in recognizing frequency-dependent patterns.

下载PDF全文

下载文献需遵守相关版权规定

论文标题