IMO^3: Interactive Multi-Objective Off-Policy Optimization

IMO^3: Interactive Multi-Objective Off-Policy Optimization

Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 3523-3529. https://doi.org/10.24963/ijcai.2022/489

Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. However, we consider a more practical but challenging setting of unknown objective functions. In industry, optimization under this setting is mostly approached with online A/B testing, which is often costly and inefficient. As an alternative, we propose Interactive Multi-Objective Off-policy Optimization (IMO^3). The key idea of IMO^3 is to interact with a system designer using policies evaluated in an off-policy fashion to uncover which policy maximizes her unknown utility function. We theoretically show that IMO^3 identifies a near-optimal policy with high probability, depending on the amount of designer's feedback and training data for off-policy estimation. We demonstrate its effectiveness empirically on several multi-objective optimization problems.
Keywords:
Machine Learning: Experimental Methodology
Machine Learning: Optimisation
Uncertainty in AI: Decision and Utility Theory
Knowledge Representation and Reasoning: Preference Modelling and Preference-Based Reasoning
Humans and AI: Human-AI Collaboration