Integrating Vision-Language Semantic Graphs in Multi-View Clustering

Integrating Vision-Language Semantic Graphs in Multi-View Clustering

JunLong Ke, Zichen Wen, Yechenhao Yang, Chenhang Cui, Yazhou Ren, Xiaorong Pu, Lifang He

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4273-4281. https://doi.org/10.24963/ijcai.2024/472

In recent years, a variety of graph learning-based multi-view clustering (MVC) methods have emerged. However, these methods continue to face challenges in extracting latent features from real-world data, particularly in scenarios involving high-resolution color images and high-dimensional features. This task is notably difficult in cases where images are visually similar yet semantically diverse. To address this issue, we present a novel large-scale pre-trained model for multi-view clustering, named Integrate Vision-Language Semantic Graphs in Multi-View Clustering (IVSGMV), which harnesses the capabilities of visual-language pre-training models to enhance clustering performance and confronts issues in the unsupervised tuning of pre-trained models for multi-view data. We introduce an effective unsupervised approach for creating semantic graphs from image multi-view datasets using pre-trained encoders. Our method addresses the inherent spatial noise and imbalance in these encoders by employing graph filters and a joint process that integrates both image node and edge features. Additionally, we demonstrate the application of our approach to multi-view image clustering on extensive datasets, notably the high-resolution MVImgNet, achieving an impressive 82% accuracy. Furthermore, our method extends the zero-shot capabilities of large-scale pre-trained models, resulting in good performance in clustering tasks on untrained multi-view datasets.
Keywords:
Machine Learning: ML: Multi-view learning
Machine Learning: ML: Clustering
Machine Learning: ML: Multi-modal learning