Visual Data Synthesis via GAN for Zero-Shot Video Classification
Visual Data Synthesis via GAN for Zero-Shot Video Classification
Chenrui Zhang, Yuxin Peng
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 1128-1134.
https://doi.org/10.24963/ijcai.2018/157
Zero-Shot Learning (ZSL) in video classification
is a promising research direction, which aims to
tackle the challenge from explosive growth of video
categories. Most existing methods exploit seento-
unseen correlation via learning a projection between
visual and semantic spaces. However, such
projection-based paradigms cannot fully utilize the
discriminative information implied in data distribution,
and commonly suffer from the information
degradation issue caused by "heterogeneity gap".
In this paper, we propose a visual data synthesis
framework via GAN to address these problems.
Specifically, both semantic knowledge and visual
distribution are leveraged to synthesize video feature
of unseen categories, and ZSL can be turned
into typical supervised problem with the synthetic
features. First, we propose multi-level semantic
inference to boost video feature synthesis, which
captures the discriminative information implied in
joint visual-semantic distribution via feature-level
and label-level semantic inference. Second, we
propose Matching-aware Mutual Information Correlation
to overcome information degradation issue,
which captures seen-to-unseen correlation in
matched and mismatched visual-semantic pairs by
mutual information, providing the zero-shot synthesis
procedure with robust guidance signals. Experimental
results on four video datasets demonstrate
that our approach can improve the zero-shot
video classification performance significantly.
Keywords:
Computer Vision: Recognition: Detection, Categorization, Indexing, Matching, Retrieval, Semantic Interpretation
Computer Vision: Computer Vision