PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

Deyi Ji; Wenwei Jin; Hongtao Lu; Feng Zhao

doi:10.24963/ijcai.2024/99

PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation

Deyi Ji, Wenwei Jin, Hongtao Lu, Feng Zhao

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Main Track. Pages 893-901. https://doi.org/10.24963/ijcai.2024/99

PDF BibTeX

The ascension of Unmanned Aerial Vehicles (UAVs) in various fields necessitates effective UAV image segmentation, which faces challenges due to the dynamic perspectives of UAV-captured images. Traditional segmentation algorithms falter as they cannot accurately mimic the complexity of UAV perspectives, and the cost of obtaining multi-perspective labeled datasets is prohibitive. To address these issues, we introduce the PPTFormer, a novel Pseudo Multi-Perspective Transformer network that revolutionizes UAV image segmentation. Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning. The PPTFormer network boasts Perspective Decomposition, novel Perspective Prototypes, and a specialized encoder and decoder that together achieve superior segmentation results through Pseudo Multi-Perspective Attention (PMP Attention) and fusion. Our experiments demonstrate that PPTFormer achieves state-of-the-art performance across five UAV segmentation datasets, confirming its capability to effectively simulate UAV flight perspectives and significantly advance segmentation precision. This work presents a pioneering leap in UAV scene understanding and sets a new benchmark for future developments in semantic segmentation.

Keywords:

Computer Vision: CV: Scene analysis and understanding

Computer Vision: CV: Segmentation