Learning Environmental Calibration Actions for Policy Self-Evolution

Learning Environmental Calibration Actions for Policy Self-Evolution

Chao Zhang, Yang Yu, Zhi-Hua Zhou

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 3061-3067. https://doi.org/10.24963/ijcai.2018/425

Reinforcement learning in physical world is often expensive. Simulators are commonly employed to train policies. Due to the simulation error, trained-in-simulator policies are hard to be directly deployed in physical world. Therefore, how to efficiently reuse these policies to the real environment is a key issue. To address this issue, this paper presents a policy self-evolution process: in the target environment, the agent firstly executes a few calibration actions to perceive the environment, and then reuses the previous policies according to the observation of the environment. In this way, the mission of policy learning in the target environment is reduced to the task of environment identification through executing the calibration actions, which needs much less samples than learning a policy from scratch. We propose the POSEC (POlicy Self-Evolution by Calibration) approach, which learns the most informative calibration actions for policy self-evolution. Taking three robotic arm controlling tasks as the test beds, we show that the proposed method can learn a fine policy for a new arm with only a few (e.g. five) samples of the target environment.
Keywords:
Machine Learning: Reinforcement Learning
Machine Learning: Transfer, Adaptation, Multi-task Learning