Multi-Modal Sarcasm Detection Based on Dual Generative Processes

Multi-Modal Sarcasm Detection Based on Dual Generative Processes

Huiying Ma, Dongxiao He, Xiaobao Wang, Di Jin, Meng Ge, Longbiao Wang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 2279-2287. https://doi.org/10.24963/ijcai.2024/252

With the advancement of the internet, sarcastic sentiment expression on social media has grown increasingly diverse. Consequently, multimodal sarcasm detection has emerged as a valuable tool for users to comprehend and interpret sarcastic expressions. Previous research suggests that effectively integrating three modalities (namely image, text, and their inconsistencies) enhances sarcasm detection. However, in some instances, sarcasm detection can be achieved using a single modality, while others necessitate multiple modalities for accurate recognition. This variability suggests that each modality contributes differently to sarcasm detection, and employing a traditional fusion method may introduce bias in the information, unable to explicitly demonstrate the prediction ability of each modality. Therefore, we propose a multimodal sarcasm detection method based on dual generative processes. The dual generative processes map features into the same semantic space to deeply explore emotional inconsistencies between modalities. Concurrently, by incorporating the concept of strong and weak modalities, we explicitly model the modalities' contributions based on prediction performance and autonomously adjust the weight distribution. Experimental results on publicly available multi-modal sarcasm detection datasets validate the superiority of our proposed model.
Keywords:
Data Mining: DM: Mining text, web, social media
Natural Language Processing: NLP: Sentiment analysis, stylistic analysis, and argument mining