VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data

VecoCare: Visit Sequences-Clinical Notes Joint Learning for Diagnosis Prediction in Healthcare Data

Yongxin Xu, Kai Yang, Chaohe Zhang, Peinie Zou, Zhiyuan Wang, Hongxin Ding, Junfeng Zhao, Yasha Wang, Bing Xie

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 4921-4929. https://doi.org/10.24963/ijcai.2023/547

Due to the insufficiency of electronic health records (EHR) data utilized in practical diagnosis prediction scenarios, most works are devoted to learning powerful patient representations either from structured EHR data (e.g., temporal medical events, lab test results, etc.) or unstructured data (e.g., clinical notes, etc.). However, synthesizing rich information from both of them still needs to be explored. Firstly, the heterogeneous semantic biases across them heavily hinder the synthesis of representation spaces, which is critical for diagnosis prediction. Secondly, the intermingled quality of partial clinical notes leads to inadequate representations of to-be-predicted patients. Thirdly, typical attention mechanisms mainly focus on aggregating information from similar patients, ignoring important auxiliary information from others. To tackle these challenges, we propose a novel visit sequences-clinical notes joint learning approach, dubbed VecoCare. It performs a Gromov-Wasserstein Distance (GWD)-based contrastive learning task and an adaptive masked language model task in a sequential pre-training manner to reduce heterogeneous semantic biases. After pre-training, VecoCare further aggregates information from both similar and dissimilar patients through a dual-channel retrieval mechanism. We conduct diagnosis prediction experiments on two real-world datasets, which indicates that VecoCare outperforms state-of-the-art approaches. Moreover, the findings discovered by VecoCare are consistent with the medical researches.
Keywords:
Multidisciplinary Topics and Applications: MDA: Health and medicine
Machine Learning: ML: Representation learning