Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

Woo Kyung Kim, Minjong Yoo, Honguk Woo

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 4300-4307. https://doi.org/10.24963/ijcai.2024/475

Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.
Keywords:
Machine Learning: ML: Reinforcement learning