Abstract
Bernoulli Random Forests: Closing the Gap between Theoretical Consistency and Empirical Soundness / 2167
Yisen Wang, Qingtao Tang, Shu-Tao Xia, Jia Wu, Xingquan Zhu
Random forests are one type of the most effective ensemble learning methods. In spite of their sound empirical performance, the study on their theoretical properties has been left far behind. Recently, several random forests variants with nice theoretical basis have been proposed, but they all suffer from poor empirical performance. In this paper, we propose a Bernoulli random forests model (BRF), which intends to close the gap between the theoretical consistency and the empirical soundness of random forests classification. Compared to Breiman's original random forests, BRF makes two simplifications in tree construction by using two independent Bernoulli distributions. The first Bernoulli distribution is used to control the selection of candidate attributes for each node of the tree, and the second one controls the splitting point used by each node. As a result, BRF enjoys proved theoretical consistency, so its accuracy will converge to optimum (i.e., the Bayes risk) as the training data grow infinitely large. Empirically, BRF demonstrates the best performance among all theoretical random forests, and is very comparable to Breiman's original random forests (which do not have the proved consistency yet). The theoretical and experimental studies advance the research one step further towards closing the gap between the theory and the practical performance of random forests classification.