Lexical Sememe Prediction via Word Embeddings and Matrix Factorization

Lexical Sememe Prediction via Word Embeddings and Matrix Factorization

Ruobing Xie, Xingchi Yuan, Zhiyuan Liu, Maosong Sun

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 4200-4206. https://doi.org/10.24963/ijcai.2017/587

Sememes are defined as the minimum semantic units of human languages. People have manually annotated lexical sememes for words and form linguistic knowledge bases. However, manual construction is time-consuming and labor-intensive, with significant annotation inconsistency and noise. In this paper, we for the first time explore to automatically predict lexical sememes based on semantic meanings of words encoded by word embeddings. Moreover, we apply matrix factorization to learn semantic relations between sememes and words. In experiments, we take a real-world sememe knowledge base HowNet for training and evaluation, and the results reveal the effectiveness of our method for lexical sememe prediction. Our method will be of great use for annotation verification of existing noisy sememe knowledge bases and annotation suggestion of new words and phrases.
Keywords:
Natural Language Processing: Natural Language Semantics
Natural Language Processing: Natural Language Processing