Multimodal Representation Distribution Learning for Medical Image Segmentation

Multimodal Representation Distribution Learning for Medical Image Segmentation

Chao Huang, Weichao Cai, Qiuping Jiang, Zhihua Wang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4156-4164. https://doi.org/10.24963/ijcai.2024/459

Medical image segmentation is one of the most critical tasks in medical image analysis. However, the performance of existing methods is limited by the lack of high-quality labeled data due to the expensive data annotation. To alleviate this limitation, we propose a novel multi-modal learning method for medical image segmentation. In our method, medical text annotation is incorporated to compensate for the quality deficiency in image data. Moreover, previous multi-modal fusion methods ignore the commonalities and differences between different modalities. Ideally, the fused features should maximize valuable information while minimizing redundant information. To achieve this goal, we propose a multimodal feature distribution learning method. It is adopted to model the commonalities and differences between text and image. Since medical image segmentation needs to predict detailed segmentation boundaries, we also design a prompt encoder to achieve fine-grained segmentation. Experimental results on three datasets show that the proposed method obtains superior segmentation performance. Source codes will be available at https://github.com/GPIOX/Multimodal.git.
Keywords:
Machine Learning: ML: Multi-modal learning
Computer Vision: CV: Segmentation
Computer Vision: CV: Representation learning