Adaptive Estimation Q-learning with Uncertainty and Familiarity

Adaptive Estimation Q-learning with Uncertainty and Familiarity

Xiaoyu Gong, Shuai Lü, Jiayu Yu, Sheng Zhu, Zongze Li

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 3750-3758. https://doi.org/10.24963/ijcai.2023/417

One of the key problems in model-free deep reinforcement learning is how to obtain more accurate value estimations. Current most widely-used off-policy algorithms suffer from over- or underestimation bias which may lead to unstable policy. In this paper, we propose a novel method, Adaptive Estimation Q-learning (AEQ), which uses uncertainty and familiarity to control the value estimation naturally and can adaptively change for specific state-action pair. We theoretically prove the property of our familiarity term which can even keep the expected estimation bias approximate to 0, and experimentally demonstrate our dynamic estimation can improve the performance and prevent the bias continuously increasing. We evaluate AEQ on several continuous control tasks, outperforming state-of-the-art performance. Moreover, AEQ is simple to implement and can be applied in any off-policy actor-critic algorithm.
Keywords:
Machine Learning: ML: Deep reinforcement learning
Machine Learning: ML: Ensemble methods