Culturally-aware Image Captioning

Culturally-aware Image Captioning

Youngsik Yun

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Doctoral Consortium. Pages 8520-8521. https://doi.org/10.24963/ijcai.2024/975

The primary research challenge lies in mitigating and measuring geographical and demographic biases in generative models, which is crucial for ensuring fairness in AI applications. Existing models trained on web-crawled datasets like LAION-400M often perpetuate harmful stereotypes and biases, especially concerning minority groups or less-represented regions. To address this, I proposed a framework called CIC (Culturally-aware Image Caption) to generate culturally-aware image captions. This framework leverages visual question answering (VQA) to extract cultural visual elements from images. It prompts both caption prompts and cultural visual elements to generate culturally-aware captions using large language models (LLMs). Human evaluations confirm the effectiveness of our approach in depicting cultural information accurately. Two key future directions are outlined. First, current image caption evaluation methods are inadequate for assessing culturally-aware captions, necessitating the development of new evaluation metrics leveraging cultural datasets and representations. Second, ethical considerations, particularly concerning stereotypes embedded in existing models, demand consensus and standards development through diverse cultural perspectives. Addressing these challenges is vital for the responsible deployment of AI technologies in diverse real-world contexts.
Keywords:
DC: AI Ethics, Trust, Fairness
DC: Natural Language Processing
DC: Computer Vision