PointTFA: Training-Free Clustering Adaption for Large 3D Point Cloud Models
PointTFA: Training-Free Clustering Adaption for Large 3D Point Cloud Models
Jinmeng Wu, Chong Cao, Hao Zhang, Basura Fernando, Yanbin Hao, Hanyu Hong
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1434-1442.
https://doi.org/10.24963/ijcai.2024/159
The success of contrastive learning models like CLIP, known for aligning 2D image-text pairs, has inspired the development of triplet alignment for Large 3D Point Cloud Models (3D-PCM). Examples like ULIP integrate images, text, and point clouds into a unified semantic space. However, despite showing impressive zero-shot capabilities, frozen 3D-PCM still falls short compared to fine-tuned methods, especially when downstream 3D datasets are significantly different from upstream data. Addressing this, we propose a Data-Efficient, Training-Free 3D Adaptation method named PointTFA that adjusts ULIP outputs with representative samples. PointTFA comprises the Representative Memory Cache (RMC) for selecting a representative support set, Cloud Query Refactor (CQR) for reconstructing a query cloud using the support set, and Training-Free 3D Adapter (3D-TFA) for inferring query categories from the support set. A key advantage of PointTFA is that it introduces no extra training parameters, yet outperforms vanilla frozen ULIP, closely approaching few-shot fine-tuning training methods in downstream cloud classification tasks like ModelNet10 & 40 and ScanObjectNN. The code is available at: https://github.com/CaoChong-git/PointTFA.
Keywords:
Computer Vision: CV: 3D computer vision
Computer Vision: CV: Recognition (object detection, categorization)