Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics

Beyond Strict Competition: Approximate Convergence of Multi-agent Q-Learning Dynamics

Aamal Hussain, Francesco Belardinelli, Georgios Piliouras

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 135-143. https://doi.org/10.24963/ijcai.2023/16

The behaviour of multi-agent learning in competitive settings is often considered under the restrictive assumption of a zero-sum game. Only under this strict requirement is the behaviour of learning well understood; beyond this, learning dynamics can often display non-convergent behaviours which prevent fixed-point analysis. Nonetheless, many relevant competitive games do not satisfy the zero-sum assumption. Motivated by this, we study a smooth variant of Q-Learning, a popular reinforcement learning dynamics which balances the agents' tendency to maximise their payoffs with their propensity to explore the state space. We examine this dynamic in games which are `close' to network zero-sum games and find that Q-Learning converges to a neighbourhood around a unique equilibrium. The size of the neighbourhood is determined by the `distance' to the zero-sum game, as well as the exploration rates of the agents. We complement these results by providing a method whereby, given an arbitrary network game, the `nearest' network zero-sum game can be found efficiently. Importantly, our theoretical guarantees are widely applicable in different game settings, regardless of whether the dynamics ultimately reach an equilibrium, or remain non convergent.
Keywords:
Agent-based and Multi-agent Systems: MAS: Multi-agent learning
Game Theory and Economic Paradigms: GTEP: Noncooperative games
Machine Learning: ML: Reinforcement learning