SAEIR: Sequentially Accumulated Entropy Intrinsic Reward for Cooperative Multi-Agent Reinforcement Learning with Sparse Reward

SAEIR: Sequentially Accumulated Entropy Intrinsic Reward for Cooperative Multi-Agent Reinforcement Learning with Sparse Reward

Xin He, Hongwei Ge, Yaqing Hou, Jincheng Yu

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4107-4115. https://doi.org/10.24963/ijcai.2024/454

Multi-agent reinforcement learning (MARL) performs well for solving complex cooperative tasks when the scenarios have well-defined dense rewards. However, there are usually sparse reward settings in many real-world multi-agent systems, which makes it difficult for MARL algorithms to successfully learn an effective strategy. To tackle this problem, we propose a novel sequentially accumulated entropy intrinsic reward named SAEIR, which utilizes the entropy of multi-agent system as a bonus to accelerate learning. Specifically, the multi-scale hypergraph critic is proposed to obtain high-order system state representation, which also enhances the ability to effectively evaluate the action produced by the actor. Based on the comprehensive and compact system state representation, the orderliness of multi-agent systems can be measured to determine the highly valuable states for adding entropy-based intrinsic rewards which leads to a highly efficient learning process. Empirical results demonstrate that our proposed method achieves state-of-the-art performance in several complex cooperative multi-agent environments with sparse reward settings.
Keywords:
Machine Learning: ML: Multiagent Reinforcement Learning
Agent-based and Multi-agent Systems: MAS: Coordination and cooperation
Agent-based and Multi-agent Systems: MAS: Multi-agent learning