Cross-Modality Person Re-Identification with Generative Adversarial Training

Cross-Modality Person Re-Identification with Generative Adversarial Training

Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, Yuyu Huang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 677-683. https://doi.org/10.24963/ijcai.2018/94

Person re-identification (Re-ID) is an important task in video surveillance which automatically searches and identifies people across different cameras. Despite the extensive Re-ID progress in RGB cameras, few works have studied the Re-ID between infrared and RGB images, which is essentially a cross-modality problem and widely encountered in real-world scenarios. The key challenge lies in two folds, i.e., the lack of discriminative information to re-identify the same person between RGB and infrared modalities, and the difficulty to learn a robust metric towards such a large-scale cross-modality retrieval. In this paper, we tackle the above two challenges by proposing a novel cross-modality generative adversarial network (termed cmGAN). To handle the issue of insufficient discriminative information, we leverage the cutting-edge generative adversarial training to design our own discriminator to learn discriminative feature representation from different modalities. To handle the issue of large-scale cross-modality metric learning, we integrates both identification loss and cross-modality triplet loss, which minimize inter-class ambiguity while maximizing cross-modality similarity among instances. The entire cmGAN can be trained in an end-to-end manner by using standard deep neural network framework. We have quantized the performance of our work in the newly-released SYSU RGB-IR Re-ID benchmark, and have reported superior performance, i.e., Cumulative Match Characteristic curve (CMC) and Mean Average Precision (MAP), over the state-of-the-art works [Wu et al., 2017], respectively.
Keywords:
Computer Vision: Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation
Computer Vision: Computer Vision