gRegress: Extracting Features from Graph Transactions for Regression

In this work we propose gRegress, a new algorithm which given set of labeled graphs and a real value associated with each graph extracts the complete set of subgraphs such that a) each subgraph in this set has correlation with the real value above a user-specified threshold and b) each subgraph in this set has correlation with any other subgraph in the set below a user-specified threshold. gRegress incorporates novel pruning mechanisms based on correlation of a subgraph feature with the output and correlation with other subgraph features. These pruning mechanisms lead to significant speedup. Experimental results indicate that in terms of runtime, gRegress substantially outperforms gSpan, often by an order of magnitude while the regression models produced by both approaches have comparable accuracy.

Nikhil S. Ketkar, Lawrence B. Holder, Diane J. Cook