MOSER: Learning Sensory Policy for Task-specific Viewpoint via View-conditional World Model

MOSER: Learning Sensory Policy for Task-specific Viewpoint via View-conditional World Model

Shenghua Wan, Hai-Hang Sun, Le Gan, De-Chuan Zhan

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 5046-5054. https://doi.org/10.24963/ijcai.2024/558

Reinforcement learning from visual observations is a challenging problem with many real-world applications. Existing algorithms mostly rely on a single observation from a well-designed fixed camera that requires human knowledge. Recent studies learn from different viewpoints with multiple fixed cameras, but this incurs high computation and storage costs and may not guarantee the coverage of the optimal viewpoint. To alleviate these limitations, we propose a straightforward View-conditional Partially Observable Markov Decision Processes (VPOMDPs) assumption and develop a new method, the MOdel-based SEnsor controlleR (MOSER). MOSER jointly learns a view-conditional world model (VWM) to simulate the environment, a sensory policy to control the camera, and a motor policy to complete tasks. We design intrinsic rewards from the VWM without additional modules to guide the sensory policy to adjust the camera parameters. Experiments on locomotion and manipulation tasks demonstrate that MOSER autonomously discovers task-specific viewpoints and significantly outperforms most baseline methods.
Keywords:
Machine Learning: ML: Reinforcement learning
Machine Learning: ML: Model-based and model learning reinforcement learning
Machine Learning: ML: Partially observable reinforcement learning and POMDPs