Continual Multi-Objective Reinforcement Learning via Reward Model Rehearsal
Continual Multi-Objective Reinforcement Learning via Reward Model Rehearsal
Lihe Li, Ruotong Chen, Ziqian Zhang, Zhichao Wu, Yi-Chen Li, Cong Guan, Yang Yu, Lei Yuan
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4434-4442.
https://doi.org/10.24963/ijcai.2024/490
Multi-objective reinforcement learning (MORL) approaches address real-world problems with multiple objectives by learning policies maximizing returns weighted by different user preferences. Typical methods assume the objectives remain unchanged throughout the agent's lifetime. However, in some real-world situations, the agent may encounter dynamically changing learning objectives, i.e., different vector-valued reward functions at different learning stages. This issue has not been considered in problem formulation or algorithm design. To address this issue, we formalize the setting as a continual MORL (CMORL) problem for the first time, accounting for the evolution of objectives throughout the learning process. Subsequently, we propose Continual Multi-Objective Reinforcement Learning via Reward Model Rehearsal (CORe3), incorporating a dynamic agent network for rapid adaptation to new objectives. Moreover, we develop a reward model rehearsal technique to recover the reward signals for previous objectives, thus alleviating catastrophic forgetting. Experiments on four CMORL benchmarks showcase that CORe3 effectively learns policies satisfying different preferences on all encountered objectives, and outperforms the best baseline by 171%, highlighting the capability of CORe3 to handle situations with evolving objectives.
Keywords:
Machine Learning: ML: Reinforcement learning
Machine Learning: ML: Incremental learning
Machine Learning: ML: Optimization