Population-Based Diverse Exploration for Sparse-Reward Multi-Agent Tasks

Population-Based Diverse Exploration for Sparse-Reward Multi-Agent Tasks

Pei Xu, Junge Zhang, Kaiqi Huang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 283-291. https://doi.org/10.24963/ijcai.2024/32

Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Although population-based learning shows its potential in producing diverse behaviors, most previous works still focus on improving the exploration of a single joint policy. In this paper, we show that with a suitable exploration method, maintaining a population of joint policies rather than one joint policy can significantly improve exploration. Our key idea is to guide each member of the population to explore different regions of the environment. To this end, we propose a member-aware exploration objective which explicitly guides each member to maximize deviation from the explored regions of other members, thus forcing them to explore different regions. In addition, we further propose an exploration-enhanced policy constraint to guide each member to learn a joint policy that is both different from other members and promotes exploration, thus increasing the probability of exploring different regions. Under reward-free setting, our method achieves 72% average improvement in the number of explored states compared to classical exploration methods in the multiple-particle environment. Moreover, under sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks.
Keywords:
Agent-based and Multi-agent Systems: MAS: Multi-agent learning
Machine Learning: ML: Reinforcement learning