An Image-enhanced Molecular Graph Representation Learning Framework

An Image-enhanced Molecular Graph Representation Learning Framework

Hongxin Xiang, Shuting Jin, Jun Xia, Man Zhou, Jianmin Wang, Li Zeng, Xiangxiang Zeng

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 6107-6115. https://doi.org/10.24963/ijcai.2024/675

Extracting rich molecular representation is a crucial prerequisite for accurate drug discovery. Recent molecular representation learning methods achieve impressive progress, but the paradigm of learning from a single modality gradually encounters the bottleneck of limited representation capabilities. In this work, we fully consider the rich visual information contained in 3D conformation molecular images (i.e., texture, shadow, color and planar spatial information) and distill graph-based models for more discriminative drug discovery. Specifically, we propose an image-enhanced molecular graph representation learning framework that leverages multi-view molecular images rendered from 3D conformations to boost molecular graph representations. To extract useful auxiliary knowledge from multi-view images, we design a teacher, which is pre-trained on 2 million molecules with conformations through five meticulously designed pre-training tasks. To transfer knowledge from teacher to graph-based students, we pose an efficient cross-modal knowledge distillation strategy with knowledge enhancer and task enhancer. It is worth noting that the distillation architecture of IEM can be directly integrated into existing graph-based models, and significantly improves the capabilities of these models (e.g. GIN, EdgePred, GraphMVP, MoleBERT) for molecular representation learning. In particular, GraphMVP and MoleBERT equipped with IEM achieve new state-of-the-art performance on MoleculeNet benchmark, achieving average 73.89% and 73.81% ROC-AUC, respectively. Code is available at https://github.com/HongxinXiang/IEM.
Keywords:
Multidisciplinary Topics and Applications: MTA: Bioinformatics
Machine Learning: ML: Knowledge-aided learning
Machine Learning: ML: Self-supervised Learning
Machine Learning: ML: Representation learning