Massively Parallel Single-Source SimRanks in O(log N) Rounds
Massively Parallel Single-Source SimRanks in O(log N) Rounds
Siqiang Luo, Zulun Zhu
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 2252-2260.
https://doi.org/10.24963/ijcai.2024/249
SimRank is one of the most fundamental measures that evaluate the structural similarity between two nodes in a graph and has been applied in a plethora of data mining and machine learning tasks. These tasks often involve single-source SimRank computation that evaluates the SimRank values between a source node u and all other nodes. Due to its high computation complexity, single-source SimRank computation for large graphs is notoriously challenging, and hence recent studies resort to distributed processing. To our surprise, although SimRank has been widely adopted for two decades, theoretical aspects of distributed SimRanks with provable results have rarely been studied.
In this paper, we conduct a theoretical study on single-source SimRank computation in the Massive Parallel Computation (MPC) model, which is the standard theoretical framework modeling distributed systems. Existing distributed SimRank algorithms enforce either Ω(log n) communication round complexity or Ω(n) machine space for a graph of n nodes. We overcome this barrier.
Particularly, given a graph of n nodes, for any query node v and constant error ϵ>3/n, we show that using O(log² log n) rounds of communication among machines is enough to compute single-source SimRank values with at most ϵ absolute errors, while each machine only needs a space sub-linear to n. To the best of our knowledge, this is the first single-source SimRank algorithm in MPC that can overcome the Θ(log n) round complexity barrier with provable result accuracy.
Keywords:
Data Mining: DM: Parallel, distributed and cloud-based high performance mining
Data Mining: DM: Theoretical foundations of data mining