HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

Shiyang Yan; Jun Xu; Yuai Liu; Lin Xu

HorNet: A Hierarchical Offshoot Recurrent Network for Improving Person Re-ID via Image Captioning

Shiyang Yan, Jun Xu, Yuai Liu, Lin Xu

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

Main track. Pages 5342-5348. https://doi.org/10.24963/ijcai.2019/742

PDF BibTeX

Person re-identification (re-ID) aims to recognize a person-of-interest across different cameras with notable appearance variance. Existing research works focused on the capability and robustness of visual representation. In this paper, instead, we propose a novel hierarchical offshoot recurrent network (HorNet) for improving person re-ID via image captioning. Image captions are semantically richer and more consistent than visual attributes, which could significantly alleviate the variance. We use the similarity preserving generative adversarial network (SPGAN) and an image captioner to fulfill domain transfer and language descriptions generation. Then the proposed HorNet can learn the visual and language representation from both the images and captions jointly, and thus enhance the performance of person re-ID. Extensive experiments are conducted on several benchmark datasets with or without image captions, i.e., CUHK03, Market-1501, and Duke-MTMC, demonstrating the superiority of the proposed method. Our method can generate and extract meaningful image captions while achieving state-of-the-art performance.

Keywords:

Natural Language Processing: Natural Language Generation

Machine Learning: Deep Learning

Natural Language Processing: Embeddings

Computer Vision: Language and Vision

Computer Vision: Computer Vision