Exploring Urban Semantics: A Multimodal Model for POI Semantic Annotation with Street View Images and Place Names

Exploring Urban Semantics: A Multimodal Model for POI Semantic Annotation with Street View Images and Place Names

Dabin Zhang, Meng Chen, Weiming Huang, Yongshun Gong, Kai Zhao

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 2533-2541. https://doi.org/10.24963/ijcai.2024/280

Semantic annotation for points of interest (POIs) is the process of annotating a POI with a category label, which facilitates many services related to POIs, such as POI search and recommendation. Most of the existing solutions extract features related to POIs from abundant user-generated content data (e.g., check-ins and user comments). However, such data are often difficult to obtain, especially for newly created POIs. In this paper, we aim to explore semantic annotation for POIs with limited information such as POI (place) names and geographic locations. Additionally, we have found that the street view images provide extensive visual clues about POI attributes and could be an essential supplement to limited information of POIs that enables semantic annotation. To this end, we propose a novel multimodal model for POI semantic annotation, namely M3PA, which achieves enhanced semantic annotation through fusing a POI’s textual and visual representations. Specifically, M3PA extracts visual features from street view images using a pre-trained image encoder and integrates these features to generate the visual representation of a targeted POI based on a geographic attention mechanism. Furthermore, M3PA utilizes the contextual information of neighboring POIs to extract textual features and captures their spatial relationships through geographical encoding to generate the textual representation of a targeted POI. Finally, the visual and textual representations of a POI are fused for semantic annotation. Extensive experiments with POI data from Amap validate the effectiveness of M3PA for POI semantic annotation, compared with several competitive baselines.
Keywords:
Data Mining: DM: Mining spatial and/or temporal data