TUCH: Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets

TUCH: Turning Cross-view Hashing into Single-view Hashing via Generative Adversarial Nets

Xin Zhao, Guiguang Ding, Yuchen Guo, Jungong Han, Yue Gao

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 3511-3517. https://doi.org/10.24963/ijcai.2017/491

Cross-view retrieval, which focuses on searching images as response to text queries or vice versa, has received increasing attention recently. Cross-view hashing is to efficiently solve the cross-view retrieval problem with binary hash codes. Most existing works on cross-view hashing exploit multi-view embedding method to tackle this problem, which inevitably causes the information loss in both image and text domains. Inspired by the Generative Adversarial Nets (GANs), this paper presents a new model that is able to Turn Cross-view Hashing into single-view hashing (TUCH), thus enabling the information of image to be preserved as much as possible. TUCH is a novel deep architecture that integrates a language model network T for text feature extraction, a generator network G to generate fake images from text feature and a hashing network H for learning hashing functions to generate compact binary codes. Our architecture effectively unifies joint generative adversarial learning and cross-view hashing. Extensive empirical evidence shows that our TUCH approach achieves state-of-the-art results, especially on text to image retrieval, based on image-sentences datasets, i.e. standard IAPRTC-12 and large-scale Microsoft COCO.
Keywords:
Machine Learning: Feature Selection/Construction
Natural Language Processing: Information Retrieval
Machine Learning: Deep Learning