Learning from Demonstrations with High-Level Side Information

Learning from Demonstrations with High-Level Side Information

Min Wen, Ivan Papusha, Ufuk Topcu

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 3055-3061. https://doi.org/10.24963/ijcai.2017/426

We consider the problem of learning from demonstration, where extra side information about the demonstration is encoded as a co-safe linear temporal logic formula. We address two known limitations of existing methods that do not account for such side information. First, the policies that result from existing methods, while matching the expected features or likelihood of the demonstrations, may still be in conflict with high-level objectives not explicit in the demonstration trajectories. Second, existing methods fail to provide a priori guarantees on the out-of-sample generalization performance with respect to such high-level goals. This lack of formal guarantees can prevent the application of learning from demonstration to safety- critical systems, especially when inference to state space regions with poor demonstration coverage is required. In this work, we show that side information, when explicitly taken into account, indeed improves the performance and safety of the learned policy with respect to task implementation. Moreover, we describe an automated procedure to systematically generate the features that encode side information expressed in temporal logic.
Keywords:
Machine Learning: Reinforcement Learning
Planning and Scheduling: Markov Decisions Processes
Robotics and Vision: Task learning, learning from demonstration, Adaptive control
Agent-based and Multi-agent Systems: Formal verification, validation and synthesis