Multi-Modal Image Annotation with Multi-Instance Multi-Label LDA / 1558
Cam-Tu Nguyen, De-Chuan Zhan, Zhi-Hua Zhou

This paper studies the problem of image annotation in a multi-modal setting where both visual and textual information are available. We propose Multi-modal Multi-instance Multi-label Latent Dirichlet Allocation (M3LDA), where the model consists of a visual-label part, a textual-label part and a label-topic part. The basic idea is that the topic decided by the visual information and the topic decided by the textual information should be consistent, leading to the correct label assignment. Particularly, M3LDA is able to annotate image regions, thus provides a promising way to understand the relation between input patterns and output semantics. Experiments on Corel5K and ImageCLEF validate the effectiveness of the proposed method.