A Latent Variable Model for Learning Distributional Relation Vectors

A Latent Variable Model for Learning Distributional Relation Vectors

Jose Camacho-Collados, Luis Espinosa-Anke, Shoaib Jameel, Steven Schockaert

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
Main track. Pages 4911-4917. https://doi.org/10.24963/ijcai.2019/682

Recently a number of unsupervised approaches have been proposed for learning vectors that capture the relationship between two words. Inspired by word embedding models, these approaches rely on co-occurrence statistics that are obtained from sentences in which the two target words appear. However, the number of such sentences is often quite small, and most of the words that occur in them are not relevant for characterizing the considered relationship. As a result, standard co-occurrence statistics typically lead to noisy relation vectors. To address this issue, we propose a latent variable model that aims to explicitly determine what words from the given sentences best characterize the relationship between the two target words. Relation vectors then correspond to the parameters of a simple unigram language model which is estimated from these words.
Keywords:
Natural Language Processing: Natural Language Semantics
Natural Language Processing: Natural Language Processing
Natural Language Processing: Embeddings