E1 - Evaluating Machine Learning and Knowledge Discovery

Sunday, AM

Foster Provost & David Jensen

An increasing proportion of AI systems discover and apply new knowledge. Obvious examples are dedicated systems for machine learning and data mining, and further examples include planning, scheduling, problem solving, and robotic systems that embed learning in a larger context. In addition, many AI systems are the result of careful experimentation and tuning by experimenters - a form of interactive knowledge discovery.

This tutorial examines the central question of how to evaluate discovered knowledge. Such evaluation can be carried out by a researcher or by an AI system itself. In either case, careful evaluation is the key to improving learned knowledge and to using it effectively.

The tutorial will cover four general topics. First, it will examine fundamentals of empirical evaluation of learned knowledge, including basic challenges, statistical foundations, useful statistical and visualization techniques, and specific pitfalls. Second, it will discuss how to evaluate learned knowledge in the context of the goals and problem characteristics of a specific task, focusing on specific techniques for evaluating knowledge in the face of uncertainty about particular task parameters such as error costs and class frequencies. Third, it will examine the specific challenges of evaluating knowledge when it is derived inductively, concentrating on unifying ideas from statistics, computational learning theory, and minimum description length formalisms. Finally, it will address the challenges faced by open-ended knowledge discovery, surveying insights from a wide body of work ranging from AM, EURISKO, and MetaDENDRAL through more recent work in scientific discovery and data mining.

Prerequisite knowledge:
The tutorial assumes almost no prior background in statistics, though audience members should be familiar with basic machine learning algorithms for classification and reinforcement learning. The tutorial is best suited to researchers who are building systems with a learning component, and to researchers constructing dedicated machine learning and data mining systems.

Foster Provost studies knowledge discovery and machine learning at Bell Atlantic (formerly NYNEX) Science and Technology. His research has focused on evaluation, scaling up, and using background knowledge, and on applications such as fraud detection and network diagnosis. With Ron Kohavi, Foster coedited a recent special issue of the journal Machine Learning on "Applications of Machine Learning and the Knowledge Discovery Process."

David Jensen is research assistant professor of computer science at the University of Massachusetts, Amherst. His research focuses on learning and knowledge discovery, and he has written and spoken extensively on statistical pathologies of learning algorithms. He is managing editor of Evaluation of Intelligent Systems, a web-accessible resource about statistical evaluation methods for studying AI systems.

Foster and David presented a tutorial on evaluating data mining algorithms at the 1998 International Conference on Knowledge Discovery and Data Mining (KDD98).

Webmaster: Sven Olofsson, sveno@dsv.su.se

Last modified: Mar 14, 1999