Abstract

Distributed Data Mining: Why Do More Than Aggregating Models

Distributed Data Mining: Why Do More Than Aggregating Models

Mohamed Aoun-Allah, Guy Mineau

In this paper we deal with the problem of mining large distributed databases. We show that the aggregation of models, i.e., sets of disjoint classification rules, each built over a subdatabase is quite enough to get an aggregated model that is both predictive and descriptive, that presents excellent prediction capability and that is conceptually much simpler than the comparable techniques. These results are made possible by lifting the disjoint cover constraint on the aggregated model and by the use of a confidence coefficient associated with each rule in a weighted majority vote.