Efficient Pruning of Large Knowledge Graphs

Efficient Pruning of Large Knowledge Graphs

Stefano Faralli, Irene Finocchi, Simone Paolo Ponzetto, Paola Velardi

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 4055-4063. https://doi.org/10.24963/ijcai.2018/564

In this paper we present an efficient and highly accurate algorithm to prune noisy or over-ambiguous knowledge graphs given as input an extensional definition of a domain of interest, namely as a set of instances or concepts. Our method climbs the graph in a bottom-up fashion, iteratively layering the graph and pruning nodes and edges in each layer while not compromising the connectivity of the set of input nodes. Iterative layering and protection of pre-defined nodes allow to extract semantically coherent DAG structures from noisy or over-ambiguous cyclic graphs, without loss of information and without incurring in computational bottlenecks, which are the main problem of state-of-the-art methods for cleaning large, i.e., Web-scale, knowledge graphs. We apply our algorithm to the tasks of pruning automatically acquired taxonomies using benchmarking data from a SemEval evaluation exercise, as well as the extraction of a domain-adapted taxonomy from the Wikipedia category hierarchy. The results show the superiority of our approach over state-of-art algorithms in terms of both output quality and computational efficiency.
Keywords:
Natural Language Processing: Knowledge Extraction
Multidisciplinary Topics and Applications: AI and the Web