Offline Reinforcement Learning with Behavioral Supervisor Tuning
Offline Reinforcement Learning with Behavioral Supervisor Tuning
Padmanaba Srinivasan, William Knottenbelt
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4929-4937.
https://doi.org/10.24963/ijcai.2024/545
Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome. Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains. In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to prior methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning.
Keywords:
Machine Learning: ML: Offline reinforcement learning
Machine Learning: ML: Reinforcement learning