Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

Till Hofmann, Hector Geffner

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 6733-6742. https://doi.org/10.24963/ijcai.2024/744

General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative planning FOND planning method that searches for solutions not in the given state space but in an abstract state space defined by features that must be learned as well.
Keywords:
Planning and Scheduling: PS: Learning in planning and scheduling
Planning and Scheduling: PS: Planning under uncertainty