Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?

Poisoning the Well: Can We Simultaneously Attack a Group of Learning Agents?

Ridhima Bector, Hang Xu, Abhay Aradhya, Chai Quek, Zinovi Rabinovich

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 3470-3478. https://doi.org/10.24963/ijcai.2023/386

Reinforcement Learning's (RL) ubiquity has instigated research on potential threats to its training and deployment. Many works study single-learner training-time attacks that "pre-programme" behavioral triggers into a strategy. However, attacks on collections of learning agents remain largely overlooked. We remedy the situation by developing a constructive training-time attack on a population of learning agents and additionally make the attack agnostic to the population's size. The attack constitutes a sequence of environment (re)parameterizations (poisonings), generated to overcome individual differences between agents and lead the entire population to the same target behavior while minimizing effective environment modulation. Our method is demonstrated on populations of independent learners in "ghost" environments (learners do not interact or perceive each other) as well as environments with mutual awareness, with or without individual learning. From the attack perspective, we pursue an ultra-blackbox setting, i.e., the attacker's training utilizes only across-policy traces of the victim learners for both attack conditioning and evaluation. The resulting uncertainty in population behavior is managed via a novel Wasserstein distance-based Gaussian embedding of behaviors detected within the victim population. To align with prior works on environment poisoning, our experiments are based on a 3D Grid World domain and show: a) feasibility, i.e., despite the uncertainty, the attack forces a population-wide adoption of target behavior; b) efficacy, i.e., the attack is size-agnostic and transferable. Code and Appendices are available at "bit.ly/github-rb-cep".
Keywords:
Machine Learning: ML: Adversarial machine learning
Agent-based and Multi-agent Systems: MAS: Multi-agent learning
Machine Learning: ML: Deep reinforcement learning