PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation
PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation
Deyi Ji, Wenwei Jin, Hongtao Lu, Feng Zhao
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 893-901.
https://doi.org/10.24963/ijcai.2024/99
The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel Pseudo Multi-Perspective Transformer network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Decomposition, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.
Keywords:
Computer Vision: CV: Scene analysis and understanding
Computer Vision: CV: Segmentation