Dual Adversarial Networks for Zero-shot Cross-media Retrieval

Dual Adversarial Networks for Zero-shot Cross-media Retrieval

Jingze Chi, Yuxin Peng

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 663-669. https://doi.org/10.24963/ijcai.2018/92

Existing cross-media retrieval methods usually require that testing categories remain the same with training categories, which cannot support the retrieval of increasing new categories. Inspired by zero-shot learning, this paper proposes zeroshot cross-media retrieval for addressing the above problem, which aims to retrieve data of new categories across different media types. It is challenging that zero-shot cross-media retrieval has to handle not only the inconsistent semantics across new and known categories, but also the heterogeneous distributions across different media types. To address the above challenges, this paper proposes Dual Adversarial Networks for Zero-shot Crossmedia Retrieval (DANZCR), which is the first approach to address zero-shot cross-media retrieval to the best of our knowledge. Our DANZCR approach consists of two GANs in a dual structure for common representation generation and original representation reconstruction respectively, which capture the underlying data structures as well as strengthen relations between input data and semantic space to generalize across seen and unseen categories. Our DANZCR approach exploits word embeddings to learn common representations in semantic space via an adversarial learning method, which preserves the inherent cross-media correlation and enhances the knowledge transfer to new categories. Experiments on three widely-used cross-media retrieval datasets show the effectiveness of our approach.
Keywords:
Computer Vision: Language and Vision
Multidisciplinary Topics and Applications: Information Retrieval