Abstract
A Bilingual Graph-Based Semantic Model for Statistical Machine Translation / 2950
Rui Wang, Hai Zhao, Sabine Ploux, Bao-Liang Lu, Masao Utiyama
Most existing bilingual embedding methods for Statistical Machine Translation (SMT) suffer from two obvious drawbacks. First, they only focus on simple context such as word count and co-occurrence in document or sliding window to build word embedding, ignoring latent useful information from selected context. Second, word sense but not word form is supposed to be the minimal semantic unit while most existing works are still for word representation. This paper presents Bilingual Graph-based Semantic Model (BGSM) to alleviate such shortcomings. By means of maximum complete sub-graph (clique) for context selection, BGSM is capable of effectively modeling word sense representation instead of the word form itself. The proposed model is applied to phrase pair translation probability estimation and generation for SMT. The empirical results show that BGSM can enhance SMT both in performance (up to +1.3 BLEU) and efficiency in comparison against existing methods.