A Transformer-Based Adaptive Prototype Matching Network for Few-Shot Semantic Segmentation

Sihan Chen; Yadang Chen; Yuhui Zheng; Zhi-Xin Yang; Enhua Wu

doi:10.24963/ijcai.2024/73

A Transformer-Based Adaptive Prototype Matching Network for Few-Shot Semantic Segmentation

Sihan Chen, Yadang Chen, Yuhui Zheng, Zhi-Xin Yang, Enhua Wu

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Main Track. Pages 659-667. https://doi.org/10.24963/ijcai.2024/73

PDF BibTeX

Few-shot semantic segmentation (FSS) aims to generate a model for segmenting novel classes using a limited number of annotated samples. Previous FSS methods have shown sensitivity to background noise due to inherent bias, attention bias, and spatial-aware bias. In this study, we propose a Transformer-Based Adaptive Prototype Matching Network to establish robust matching relationships by improving the semantic and spatial perception of query features. The model includes three modules: target enhancement module (TEM), dual constraint aggregation module (DCAM), and dual classification module (DCM). In particular, TEM mitigates inherent bias by exploring the relevance of multi-scale local context to enhance foreground features. Then, DCAM addresses attention bias through the dual semantic-aware attention mechanism to strengthen constraints. Finally, the DCM module decouples the segmentation task into semantic alignment and spatial alignment to alleviate spatial-aware bias. Extensive experiments on PASCAL-5i and COCO-20i confirm the effectiveness of our approach.

Keywords:

Computer Vision: CV: Segmentation

Computer Vision: CV: Representation learning

Computer Vision: CV: Scene analysis and understanding

Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning