Self-supervised Semantic Segmentation Grounded in Visual Concepts

Self-supervised Semantic Segmentation Grounded in Visual Concepts

Wenbin He, William Surmeier, Arvind Kumar Shekar, Liang Gou, Liu Ren

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 949-955. https://doi.org/10.24963/ijcai.2022/133

Unsupervised semantic segmentation requires assigning a label to every pixel without any human annotations. Despite recent advances in self-supervised representation learning for individual images, unsupervised semantic segmentation with pixel-level representations is still a challenging task and remains underexplored. In this work, we propose a self-supervised pixel representation learning method for semantic segmentation by using visual concepts (i.e., groups of pixels with semantic meanings, such as parts, objects, and scenes) extracted from images. To guide self-supervised learning, we leverage three types of relationships between pixels and concepts, including the relationships between pixels and local concepts, local and global concepts, as well as the co-occurrence of concepts. We evaluate the learned pixel embeddings and visual concepts on three datasets, including PASCAL VOC 2012, COCO 2017, and DAVIS 2017. Our results show that the proposed method gains consistent and substantial improvements over recent unsupervised semantic segmentation approaches, and also demonstrate that visual concepts can reveal insights into image datasets.
Keywords:
Computer Vision: Segmentation
AI Ethics, Trust, Fairness: Explainability and Interpretability
Computer Vision: Interpretability and Transparency
Computer Vision: Representation Learning
Machine Learning: Self-supervised Learning