Consistency Measures for Feature Selection: A Formal Definition, Relative Sensitivity Comparison, and a Fast Algorithm

Kilho Shin, Danny Fernandes, Seiya Miyazaki

Consistency-based feature selection is an important category of feature selection research yet is defined only intuitively in the literature. First, we formally define a consistency measure, and then using this definition, evaluate 19 feature selection measures from the literature. While only 5 of these were labeledas consistency measures by their original authors, by our definition, an additional 9 measures should be classified as consistency measures. To compare these 14 consistency measures in terms of sensitivity, we introduce the concept of quasilinear compatibility order, and partially determine the order among the measures. Next, we proposea new fast algorithm for consistency-based feature selection. We ran experiments using eleven large datasets to compare the performance of our algorithm against INTERACT and LCC, the only two instances of consistency-based algorithms with potential real world application. Our algorithm shows vast improvement in time efficiency, while its performance in accuracy is comparable with that of INTERACT and LCC.