How Unlabeled Web Videos Help Complex Event Detection?

How Unlabeled Web Videos Help Complex Event Detection?

Huan Liu, Qinghua Zheng, Minnan Luo, Dingwen Zhang, Xiaojun Chang, Cheng Deng

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 4040-4046. https://doi.org/10.24963/ijcai.2017/564

The lack of labeled exemplars is an important factor that makes the task of multimedia event detection (MED) complicated and challenging. Utilizing artificially picked and labeled external sources is an effective way to enhance the performance of MED. However, building these data usually requires professional human annotators, and the procedure is too time-consuming and costly to scale. In this paper, we propose a new robust dictionary learning framework for complex event detection, which is able to handle both labeled and easy-to-get unlabeled web videos by sharing the same dictionary. By employing the lq-norm based loss jointly with the structured sparsity based regularization, our model shows strong robustness against the substantial noisy and outlier videos from open source. We exploit an effective optimization algorithm to solve the proposed highly non-smooth and non-convex problem. Extensive experiment results over standard datasets of TRECVID MEDTest 2013 and TRECVID MEDTest 2014 demonstrate the effectiveness and superiority of the proposed framework on complex event detection.
Keywords:
Natural Language Processing: Information Retrieval
Robotics and Vision: Vision and Perception