Importance Sampling for Fair Policy Selection

Shayan Doroudi; Philip S. Thomas; Emma Brunskill

Importance Sampling for Fair Policy Selection

Shayan Doroudi, Philip S. Thomas, Emma Brunskill

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

Best Sister Conferences. Pages 5239-5243. https://doi.org/10.24963/ijcai.2018/729

PDF BibTeX

We consider the problem of off-policy policy selection in reinforcement learning: using historical data generated from running one policy to compare two or more policies. We show that approaches based on importance sampling can be unfair---they can select the worse of two policies more often than not. We then give an example that shows importance sampling is systematically unfair in a practically relevant setting; namely, we show that it unreasonably favors shorter trajectory lengths. We then present sufficient conditions to theoretically guarantee fairness. Finally, we provide a practical importance sampling-based estimator to help mitigate the unfairness due to varying trajectory lengths.

Keywords:

Machine Learning: Reinforcement Learning

Uncertainty in AI: Sequential Decision Making