Scene-Adaptive Person Search via Bilateral Modulations

Scene-Adaptive Person Search via Bilateral Modulations

Yimin Jiang, Huibing Wang, Jinjia Peng, Xianping Fu, Yang Wang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 938-946. https://doi.org/10.24963/ijcai.2024/104

Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a Scene-Adaptive Person Search (SEAS) model by introducing bilateral modulations to simultaneously eliminate scene noise and maintain a consistent person representation to adapt to various scenes. In SEAS, a Background Modulation Network (BMN) is designed to encode the feature extracted from the detected bounding box into a multi-granularity embedding, which reduces the input of background noise from multiple levels with norm-aware. Additionally, to mitigate the effect of foreground noise on the person feature, SEAS introduces a Foreground Modulation Network (FMN) to compute the clutter reduction offset for the person embedding based on the feature map of the scene image. By bilateral modulations on both background and foreground within an end-to-end manner, SEAS obtains consistent feature representations without scene noise. SEAS can achieve state-of-the-art (SOTA) performance on two benchmark datasets, CUHK-SYSU with 97.1% mAP and PRW with 60.5% mAP. The code is available at https://github.com/whbdmu/SEAS.
Keywords:
Computer Vision: CV: Image and video retrieval 
Computer Vision: CV: Representation learning