Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization

Yunfan Zhao, Nikhil Behari, Edward Hughes, Edwin Zhang, Dheeraj Nagaraj, Karl Tuyls, Aparna Taneja, Milind Tambe

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 321-329. https://doi.org/10.24963/ijcai.2024/36

Restless multi-arm bandits (RMABs) is a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching. We explore several important question such as how to handle arms opting-in and opting-out over time without frequent retraining from scratch, how to deal with continuous state settings with nonlinear reward functions, which appear naturally in practical contexts. We address these questions by developing a pre-trained model (PreFeRMAB) based on a novel combination of three key ideas: (i) to enable fast generalization, we use train agents to learn from each other's experience; (ii) to accommodate streaming RMABs, we derive a new update rule for a crucial $\lambda$-network; (iii) to handle more complex continuous state settings, we design the algorithm to automatically define an abstract state based on raw observation and reward data. PreFeRMAB allows general zero-shot ability on previously unseen RMABs, and can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. We theoretically prove the benefits of multi-arm generalization and empirically demonstrate the advantages of our approach on several challenging, real-world inspired problems.
Keywords:
Agent-based and Multi-agent Systems: MAS: Multi-agent learning
Agent-based and Multi-agent Systems: MAS: Resource allocation
Machine Learning: ML: Few-shot learning
Agent-based and Multi-agent Systems: MAS: Multi-agent planning