Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations

Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations

Peixi Peng, Junliang Xing, Lili Cao

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence
Main track. Pages 3037-3043. https://doi.org/10.24963/ijcai.2020/420

This paper aims to learn multi-agent cooperation where each agent performs its actions in a decentralized way. In this case, it is very challenging to learn decentralized policies when the rewards are global and sparse. Recently, learning from demonstrations (LfD) provides a promising way to handle this challenge. However, in many practical tasks, the available demonstrations are often sub-optimal. To learn better policies from these sub-optimal demonstrations, this paper follows a centralized learning and decentralized execution framework and proposes a novel hybrid learning method based on multi-agent actor-critic. At first, the expert trajectory returns generated from demonstration actions are used to pre-train the centralized critic network. Then, multi-agent decisions are made by best response dynamics based on the critic and used to train the decentralized actor networks. Finally, the demonstrations are updated by the actor networks, and the critic and actor networks are learned jointly by running the above two steps alliteratively. We evaluate the proposed approach on a real-time strategy combat game. Experimental results show that the approach outperforms many competing demonstration-based methods.
Keywords:
Machine Learning: Reinforcement Learning
Agent-based and Multi-agent Systems: Multi-agent Learning
Multidisciplinary Topics and Applications: Computer Games