Persistent Homology: An Introduction and a New Text Representation for Natural Language Processing / 1953
Xiaojin Zhu

Persistent homology is a mathematical tool from topological data analysis. It performs multi-scale analysis on a set of points and identifies clusters, holes, and voids therein. These latter topological structures complement standard feature representations, making persistent homology an attractive feature extractor for artificial intelligence. Research on persistent homology for AI is in its infancy, and is currently hindered by two issues: the lack of an accessible introduction to AI researchers, and the paucity of applications. In response, the first part of this paper presents a tutorial on persistent homology specifically aimed at a broader audience without sacrificing mathematical rigor. The second part contains one of the first applications of persistent homology to natural language processing. Specifically, our Similarity Filtration with Time Skeleton (SIFTS) algorithm identifies holes that can be interpreted as semantic "tie-backs" in a text document, providing a new document structure representation. We illustrate our algorithm on documents ranging from nursery rhymes to novels, and on a corpus with child and adolescent writings.