A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings

A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings

Liangchen Wei, Zhi-Hong Deng

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 4165-4171. https://doi.org/10.24963/ijcai.2017/582

Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word embeddings. The variational model introduces a continuous latent variable to explicitly model the underlying semantics of the parallel sentence pairs and to guide the generation of the sentence pairs. Our model restricts the bilingual word embeddings to represent words in exactly the same continuous vector space. Empirical results on the task of cross lingual document classification has shown that our method is effective.
Keywords:
Natural Language Processing: Natural Language Semantics
Natural Language Processing: Text Classification