Sampling for Approximate Bipartite Network Projection

Sampling for Approximate Bipartite Network Projection

Nesreen Ahmed, Nick Duffield, Liangzhen Xia

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 3286-3292. https://doi.org/10.24963/ijcai.2018/456

Bipartite graphs manifest as a stream of edges that represent transactions, e.g., purchases by retail customers. Recommender systems employ neighborhood-based measures of node similarity, such as the pairwise number of common neighbors (CN) and related metrics. While the number of node pairs that share neighbors is potentially enormous, only a relatively small proportion of them have many common neighbors. This motivates finding a weighted sampling approach to preferentially sample these node pairs. This paper presents a new sampling algorithm that provides a fixed size unbiased estimate of the similarity matrix resulting from a bipartite edge stream projection. The algorithm has two components. First, it maintains a reservoir of sampled bipartite edges with sampling weights that favor selection of high similarity nodes. Second, arriving edges generate a stream of similarity updates, based on their adjacency with the current sample. These updates are aggregated in a second reservoir sample-based stream aggregator to yield the final unbiased estimate. Experiments on real world graphs show that a 10% sample at each stage yields estimates of high similarity edges with weighted relative errors of about 1%.
Keywords:
Machine Learning: Data Mining
Machine Learning: Relational Learning
Machine Learning: Time-series;Data Streams
Machine Learning Applications: Networks