A Consistency and Integration Model with Adaptive Thresholds for Weakly Supervised Object Localization

A Consistency and Integration Model with Adaptive Thresholds for Weakly Supervised Object Localization

Hao Su, Meng Yang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1281-1289. https://doi.org/10.24963/ijcai.2024/142

Weakly Supervised Object Localization (WSOL) is a challenging task, which aims to learn object localization with less costly image-level labels. Existing convolution neural network (CNN) based methods tend to focus on discriminative regions of objects, while transformer-based methods overemphasize deep global features powerful for classification and lack the capability to perceive object details, leading to prediction results far from the object boundary. In this paper, we propose a novel Consistency and Integration Model with Adaptive Thresholds (CIAT) that exploits the spatial-semantic consistency between shallow and deep features to activate more object regions and detects the object regions adaptively in different images. First, we introduce a simple plug-and-play consistency and integration module of shallow-deep features (CISD), which utilizes shallow features efficiently to enhance the entire object perception. Then, we design an online adaptive threshold (OAT) based on Bayesian decision theory, which computes a reasonable segmentation threshold adaptive for the localization map of each image, making the predicted bounding box closer to the ground truth. Extensive experiments on two widely used CUB-200-2011 and ILSVRC datasets verify the effectiveness of our methods.
Keywords:
Computer Vision: CV: Recognition (object detection, categorization)
Computer Vision: CV: Interpretability and transparency
Computer Vision: CV: Transfer, low-shot, semi- and un- supervised learning   
Computer Vision: CV: Segmentation