HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

Shiyang Yan, Jun Xu, Yuai Liu, Lin Xu

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 5342-5348. https://doi.org/10.24963/ijcai.2019/742

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.
Keywords:
Natural Language Processing: Natural Language Generation
Machine Learning: Deep Learning
Natural Language Processing: Embeddings
Computer Vision: Language and Vision
Computer Vision: Computer Vision