Machine Learning and Information Filtering on the Internet
Michael Pazzani
Course Description
The vast amount of information available on the Internet has given rise
to a number of agents for locating relevant, useful or interesting
information for an individual. Such agents perform tasks such as
prioritizing, filtering, or sorting electronic mail; filtering news
group articles and locating interesting articles in unread newsgroups;
guiding a user to find relevant information on the World Wide Web;
notifying a user when a significant change occurs to a web site or
providing access to information relevant to a user's current tasks.
To perform such tasks, a profile of the user's interests must be
created. In this tutorial, we will focus on the learning and
representation of user profiles, the methods for collecting user
feedback, and the representation of information sources. This tutorial
will review a variety the findings from several decades of research on
information retrieval focusing on approaches to information filtering
and classification. Next, machine learning approaches to classification
will be described including decision trees, nearest neighbor algorithms,
Bayesian classifiers and neural networks. We will discuss how they may
be used to learn user profiles. The relationship between machine
learning and classic approaches from information retrieval will be
discussed. Finally, recent developments such as collaborative filtering,
efficient rule learners, combining multiple models, weighted majority
algorithms and infinite attribute models will be described.
The technology will be illustrated with examples from a variety of
information agents including LIRA, NewsWeeder, WebWatcher, WebDoggie,
InfoFinder, Inquery, Letizia, firefly, InfoFinder, Syskill & Webert,
DICA and the Remembrance Agent.
The intended audience of this tutorial is practitioners and researchers
interested in issues involved with applying machine learning and
information retrieval algorithms to classification and ranking of
information on the Internet. A familiarity with basic knowledge of
mathematics and probability will be assumed.
About the Lecturers
Michael Pazzani
received an M.S. degree in computer science specializing in Natural
Language Processing in 1980, and a Ph.D. in computer science
specializing in Machine Learning from UCLA in 1987. He is now a
professor and department chair of Information and Computer Science at
the University of California, Irvine. He has been active in Machine
Learning research for the past decade with numerous publications in the
IJCAI, AAAI, Cognitive Science and the International Machine Learning
Conferences.
higuchi@etl.go.jp
Last modified: Thu Feb 20 13:18:52 JST 1997