MARS: Multimodal Active Robotic Sensing for Articulated Characterization

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng, Ping Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1634-1642. https://doi.org/10.24963/ijcai.2024/181

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.
Keywords:
Computer Vision: CV: 3D computer vision
Computer Vision: CV: Multimodal learning
Robotics: ROB: Perception
Robotics: ROB: Manipulation