Instance Weighting with Applications to Cross-domain Text Classification via Trading off Sample Selection Bias and Variance

Instance Weighting with Applications to Cross-domain Text Classification via Trading off Sample Selection Bias and Variance

Rui Xia, Zhenchun Pan, Feng Xu

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 4489-4495. https://doi.org/10.24963/ijcai.2018/624

Domain adaptation is an important problem in natural language processing (NLP) due to the distributional difference between the labeled source domain and the target domain. In this paper, we study the domain adaptation problem from the instance weighting perspective. By using density ratio as the instance weight, the traditional instance weighting approaches can potentially correct the sample selection bias in domain adaptation. However, researchers often failed to achieve good performance when applying instance weighting to domain adaptation in NLP and many negative results were reported in the literature. In this work, we conduct an in-depth study on the causes of the failure, and find that previous work only focused on reducing the sample selection bias, but ignored another important factor, sample selection variance, in domain adaptation. On this basis, we propose a new instance weighting framework by trading off two factors in instance weight learning. We evaluate our approach on two cross-domain text classification tasks and compare it with eight instance weighting methods. The results prove our approach's advantages in domain adaptation performance, optimization efficiency and parameter stability.
Keywords:
Natural Language Processing: Sentiment Analysis and Text Mining
Natural Language Processing: Text Classification