Adversarial Behavior Exclusion for Safe Reinforcement Learning

Adversarial Behavior Exclusion for Safe Reinforcement Learning

Md Asifur Rahman, Tongtong Liu, Sarra Alqahtani

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 483-491. https://doi.org/10.24963/ijcai.2023/54

Learning by exploration makes reinforcement learning (RL) potentially attractive for many real-world applications. However, this learning process makes RL inherently too vulnerable to be used in real-world applications where safety is of utmost importance. Most prior studies consider exploration at odds with safety and thereby restrict it using either joint optimization of task and safety or imposing constraints for safe exploration. This paper migrates from the current convention to using exploration as a key to safety by learning safety as a robust behavior that completely excludes any behavioral pattern responsible for safety violations. Adversarial Behavior Exclusion for Safe RL (AdvEx-RL) learns a behavioral representation of the agent's safety violations by approximating an optimal adversary utilizing exploration and later uses this representation to learn a separate safety policy that excludes those unsafe behaviors. In addition, AdvEx-RL ensures safety in a task-agnostic manner by acting as a safety firewall and therefore can be integrated with any RL task policy. We demonstrate the robustness of AdvEx-RL via comprehensive experiments in standard constrained Markov decision processes (CMDP) environments under 2 white-box action space perturbations as well as with changes in environment dynamics against 7 baselines. Consistently, AdvEx-RL outperforms the baselines by achieving an average safety performance of over 75% in the continuous action space with 10 times more variations in the testing environment dynamics. By using a standalone safety policy independent of conflicting objectives, AdvEx-RL also paves the way for interpretable safety behavior analysis as we show in our user study.
Keywords:
AI Ethics, Trust, Fairness: ETF: Safety and robustness
Machine Learning: ML: Reinforcement learning
AI Ethics, Trust, Fairness: ETF: Trustworthy AI
AI Ethics, Trust, Fairness: ETF: Explainability and interpretability