跨模式人群计数的时空通道注意力块

论文标题

跨模式人群计数的时空通道注意力块

Spatio-channel Attention Blocks for Cross-modal Crowd Counting

论文作者

Zhang, Youjia, Choi, Soyun, Hong, Sungeun

论文摘要

人群计算研究在现实世界应用方面取得了重大进步，但在跨模式环境中仍然是一个巨大的挑战。大多数现有方法仅依赖于RGB图像的光学特征，而忽略了其他模式（例如热图像和深度图像）的可行性。不同方式与模型体系结构的设计选择的多样性之间存在固有的显着差异，使跨模式人群变得更具挑战性。在本文中，我们提出了跨模式时段时空通道的注意（CSCA）块，可以轻松地集成到任何特定于模态的体系结构中。 CSCA首先阻止了多模式之间在空间上捕获全局功能相关性，并且通过空间跨模式的注意力较少开销。随后通过自适应通道特征聚合来完善具有空间注意的跨模式特征。在我们的实验中，拟议的块始终显示出各种骨干网络的性能改善，从而导致最新的RGB-T和RGB-D人群计数。

Crowd counting research has made significant advancements in real-world applications, but it remains a formidable challenge in cross-modal settings. Most existing methods rely solely on the optical features of RGB images, ignoring the feasibility of other modalities such as thermal and depth images. The inherently significant differences between the different modalities and the diversity of design choices for model architectures make cross-modal crowd counting more challenging. In this paper, we propose Cross-modal Spatio-Channel Attention (CSCA) blocks, which can be easily integrated into any modality-specific architecture. The CSCA blocks first spatially capture global functional correlations among multi-modality with less overhead through spatial-wise cross-modal attention. Cross-modal features with spatial attention are subsequently refined through adaptive channel-wise feature aggregation. In our experiments, the proposed block consistently shows significant performance improvement across various backbone networks, resulting in state-of-the-art results in RGB-T and RGB-D crowd counting.

下载PDF全文

下载文献需遵守相关版权规定

论文标题