Bandits with Concave Aggregated Reward

Bandits with Concave Aggregated Reward

Yingqi Yu, Sijia Zhang, Shaoang Li, Lan Zhang, Wei Xie, Xiang-Yang Li

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 5398-5406. https://doi.org/10.24963/ijcai.2024/597

Multi-armed bandit is a simple but powerful algorithmic framework, and many effective algorithms have been proposed for various online models. In numerous applications, the decision-maker faces diminishing marginal utility. With non-linear aggregations, those algorithms often have poor regret bounds. Motivated by this, we study a bandit problem with diminishing marginal utility, which we termed the bandits with concave aggregated reward(BCAR). To tackle this problem, we propose two algorithms SW-BCAR and SWUCB-BCAR. Through theoretical analysis, we establish the effectiveness of these algorithms in addressing the BCAR issue. Extensive simulations demonstrate that our algorithms achieve better results than the most advanced bandit algorithms.
Keywords:
Machine Learning: ML: Multi-armed bandits