Toward Optimal Solution for the Context-Attentive Bandit Problem

Toward Optimal Solution for the Context-Attentive Bandit Problem

Djallel Bouneffouf, Raphael Feraud, Sohini Upadhyay, Irina Rish, Yasaman Khazaeni

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 3493-3500. https://doi.org/10.24963/ijcai.2021/481

In various recommender system applications, from medical diagnosis to dialog systems, due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration; however, the agent has a freedom to choose which variables to observe. In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets.
Keywords:
Machine Learning Applications: Applications of Reinforcement Learning
Data Mining: Big Data, Large-Scale Systems
Data Mining: Recommender Systems