MISA: MIning Saliency-Aware Semantic Prior for Box Supervised Instance Segmentation

MISA: MIning Saliency-Aware Semantic Prior for Box Supervised Instance Segmentation

Hao Zhu, Yan Zhu, Jiayu Xiao, Yike Ma, Yucheng Zhang, Jintao Li, Feng Dai

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1798-1806. https://doi.org/10.24963/ijcai.2024/199

Box supervised instance segmentation (BSIS) aims to achieve an effective trade-off between annotation costs and model performance by solely relying on bounding box annotations during training process. However, we observe that BSIS model is bottlenecked by the intricate objective under limited guidance, and tends to sacrifice segmentation capability in order to effectively recognize multiple instances. To boost the BSIS model's perceptual ability for object shape and contour, we introduce MISA, that is, MIning Saliency-Aware semantic prior from a well-optimized box supervised semantic segmentation (BSSS) network, and incorporating cross-model guidance into the learning process of BSIS. Specifically, we first design a Frequency-Space Distillation (FSD) module to extract assorted salient prior knowledge from BSSS model, and perform cross-model alignment for transfering the prior to BSIS model. Furthermore, we introduce Semantic-Enhanced Pairwise Affinity (SEPA), which borrows the object perceptual ability of BSSS model to emphasize the contribution of salient objects for pairwise affinity, providing more accurate guidance for the BSIS network. Extensive experiments show that our proposed MISA consistently surpasses the existing state-of-the-art methods by a large margin in the BSIS scenario.
Keywords:
Computer Vision: CV: Segmentation
Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning