Attention Shifting to Pursue Optimal Representation for Adapting Multi-granularity Tasks

Attention Shifting to Pursue Optimal Representation for Adapting Multi-granularity Tasks

Gairui Bai, Wei Xi, Yihan Zhao, Xinhui Liu, Jizhong Zhao

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 587-595. https://doi.org/10.24963/ijcai.2024/65

Object recognition in open environments, e.g., video surveillance, poses significant challenges due to the inclusion of unknown and multi-granularity tasks (MGT). However, recent methods exhibit limitations as they struggle to capture subtle differences between different parts within an object and adaptively handle MGT. To address this limitation, this paper proposes a Class-semantic Guided Attention Shift (SegAS) method. SegAS transforms adaptive MGT into dynamic combinations of invariant discriminant representations across different levels to effectively enhance adaptability to multi-granularity downstream tasks. Specifically, SegAS incorporates a hardness-based Attention Part Filtering Strategy (ApFS) to dynamically decompose objects into complementary parts based on the object structure and relevance to the instance. Then, SegAS shifts attention to the optimal discriminant region of each part under the guidance of hierarchical class semantics. Finally, a diversity loss is employed to emphasize the importance and distinction of different partial features. Extensive experiments validate SegAS' effectiveness in multi-granularity recognition of three tasks.
Keywords:
Computer Vision: CV: Representation learning
Computer Vision: CV: Recognition (object detection, categorization)