CF-Deformable DETR: An End-to-End Alignment-Free Model for Weakly Aligned Visible-Infrared Object Detection
CF-Deformable DETR: An End-to-End Alignment-Free Model for Weakly Aligned Visible-Infrared Object Detection
Haolong Fu, Jin Yuan, Guojin Zhong, Xuan He, Jiacheng Lin, Zhiyong Li
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 758-766.
https://doi.org/10.24963/ijcai.2024/84
Weakly aligned visible-infrared object detection poses significant challenges due to the imprecise alignment between visible and infrared images. Most existing methods explore the alignment strategies between visible and infrared images, yielding unbearable computation costs. This paper first proposes an end-to-end alignment-free architecture Cross-modal Fusion Deformable DEtection TRansformer (``CF-Deformable DETR'') for weakly aligned visible-infrared object detection. Abandoning the traditional image alignment, CF-Deformable DETR introduces a simple yet effective cross-modal deformable attention mechanism to directly implement automatic cross-modal point mapping, generating well-aligned bimodal features with high efficiency. Moreover, we design a Point-level Feature Consistency Loss to guide the cross-modal point mapping, ensuring the consistency of paired features to support the following fusion. Extensive experiments are conducted on three benchmark datasets. The experimental results demonstrate that CF-Deformable DETR achieves close accuracy on weakly aligned and strictly aligned data as well as maintains stable performance to a certain extent against various offset degrees of weakly aligned data. Code is available at https://github.com/116508/CF-Deformable-DETR.
Keywords:
Computer Vision: CV: Recognition (object detection, categorization)
Computer Vision: CV: Multimodal learning