通过背景抑制和前景对齐方式提高几杆细粒度的识别

论文标题

通过背景抑制和前景对齐方式提高几杆细粒度的识别

Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment

论文作者

Zha, Zican, Tang, Hao, Sun, Yunlian, Tang, Jinhui

论文摘要

很少有细粒度识别（FS-FGR）旨在借助有限的可用样品识别新颖的细粒类别。毫无疑问，这项任务继承了几次学习和细粒度认可的主要挑战。首先，缺乏标记的样品使学习的模型易于过度合适。其次，它还遭受了较高的类内差异和数据集中较低的类差异。为了解决这一具有挑战性的任务，我们提出了一个两阶段的背景抑制和前景对齐框架，该框架由背景激活抑制（BAS）模块，前景对象对准（FOA）模块以及局部到位置（L2L）相似性指标组成。具体而言，引入BAS以生成前景面具，以削弱背景干扰并增强主导性前景对象。然后，FOA根据每个支持样本的特征图，根据其对查询的样本的校正，这解决了支持引用图像对之间未对准的问题。为了使所提出的方法能够捕获混淆样品中的细微差异，我们提出了一种新型的L2L相似性度量，以进一步测量嵌入空间中一对对齐空间特征之间的局部相似性。更重要的是，考虑到背景干扰会带来较差的鲁棒性，我们使用原始图像和精制图像来推断特征图的成对相似性。在多个流行的细粒基准上进行的广泛实验表明，我们的方法的表现优于现有的最新水平。源代码可在以下网址获得：https：//github.com/cser-tang-hao/bsfa-fsfg。

Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples. Undoubtedly, this task inherits the main challenges from both few-shot learning and fine-grained recognition. First, the lack of labeled samples makes the learned model easy to overfit. Second, it also suffers from high intra-class variance and low inter-class differences in the datasets. To address this challenging task, we propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local-to-local (L2L) similarity metric. Specifically, the BAS is introduced to generate a foreground mask for localization to weaken background disturbance and enhance dominative foreground objects. The FOA then reconstructs the feature map of each support sample according to its correction to the query ones, which addresses the problem of misalignment between support-query image pairs. To enable the proposed method to have the ability to capture subtle differences in confused samples, we present a novel L2L similarity metric to further measure the local similarity between a pair of aligned spatial features in the embedding space. What's more, considering that background interference brings poor robustness, we infer the pairwise similarity of feature maps using both the raw image and the refined image. Extensive experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state of the art by a large margin. The source codes are available at: https://github.com/CSer-Tang-hao/BSFA-FSFG.

下载PDF全文

下载文献需遵守相关版权规定

论文标题