Distributed Pareto Optimization for Subset Selection

Distributed Pareto Optimization for Subset Selection

Chao Qian, Guiying Li, Chao Feng, Ke Tang

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 1492-1498. https://doi.org/10.24963/ijcai.2018/207

The subset selection problem that selects a few items from a ground set arises in many applications such as maximum coverage, influence maximization, sparse regression, etc. The recently proposed POSS algorithm is a powerful approximation solver for this problem. However, POSS requires centralized access to the full ground set, and thus is impractical for large-scale real-world applications, where the ground set is too large to be stored on one single machine. In this paper, we propose a distributed version of POSS (DPOSS) with a bounded approximation guarantee. DPOSS can be easily implemented in the MapReduce framework. Our extensive experiments using Spark, on various real-world data sets with size ranging from thousands to millions, show that DPOSS can achieve competitive performance compared with the centralized POSS, and is almost always better than the state-of-the-art distributed greedy algorithm RandGreeDi.
Keywords:
Heuristic Search and Game Playing: Heuristic Search
Heuristic Search and Game Playing: Combinatorial Search and Optimisation
Heuristic Search and Game Playing: Heuristic Search and Machine Learning
Heuristic Search and Game Playing: Distributed Search