论文标题
部分可观测时空混沌系统的无模型预测
Rethinking Semantic Segmentation: A Prototype View
论文作者
论文摘要
尽管可以通过考虑软效果权重或查询矢量作为可学习的类原型,但仍可以将普遍的语义分割解决方案放置在一个类别中,尽管具有不同的网络设计(基于FCN或基于注意力)的网络设计(基于FCN或基于注意力)和掩盖解码策略(基于参数SoftMax或基于像素Query)。鉴于这种原型观点,这项研究发现了这种参数分割制度的几个局限性,并提出了基于非可行原型的非参数替代方案。我们的模型不是以完全参数方式学习每个类别的单个权重/查询向量,而是将每个类代表为一组非可学习的原型,仅依赖于该类中几个训练像素的平均特征。因此,通过取回的非参数原型来实现密集的预测。这使我们的模型可以通过优化嵌入式像素和锚定原型之间的布置直接形成像素嵌入空间。它能够以恒定数量的可学习参数来处理任意数量的类。我们从经验上表明,借助基于FCN和基于注意力的分割模型(即HRNET,SWIN,SEGFORMER)和骨干(即Resnet,Hrnet,Swin,Mit),我们的非参数框架可以在多个数据集(即,Ade20k,CityScapes,coco-coco)中产生令人着迷的结果,并表现出巨大的情况,并表现出coco的情况。我们希望这项工作会引起当前事实上的语义分割模型设计的重新考虑。
Prevalent semantic segmentation solutions, despite their different network designs (FCN based or attention based) and mask decoding strategies (parametric softmax based or pixel-query based), can be placed in one category, by considering the softmax weights or query vectors as learnable class prototypes. In light of this prototype view, this study uncovers several limitations of such parametric segmentation regime, and proposes a nonparametric alternative based on non-learnable prototypes. Instead of prior methods learning a single weight/query vector for each class in a fully parametric manner, our model represents each class as a set of non-learnable prototypes, relying solely on the mean features of several training pixels within that class. The dense prediction is thus achieved by nonparametric nearest prototype retrieving. This allows our model to directly shape the pixel embedding space, by optimizing the arrangement between embedded pixels and anchored prototypes. It is able to handle arbitrary number of classes with a constant amount of learnable parameters. We empirically show that, with FCN based and attention based segmentation models (i.e., HRNet, Swin, SegFormer) and backbones (i.e., ResNet, HRNet, Swin, MiT), our nonparametric framework yields compelling results over several datasets (i.e., ADE20K, Cityscapes, COCO-Stuff), and performs well in the large-vocabulary situation. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.