Positive and Unlabeled Learning via Loss Decomposition and Centroid Estimation

Positive and Unlabeled Learning via Loss Decomposition and Centroid Estimation

Hong Shi, Shaojun Pan, Jian Yang, Chen Gong

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Main track. Pages 2689-2695. https://doi.org/10.24963/ijcai.2018/373

Positive and Unlabeled learning (PU learning) aims to train a binary classifier based on only positive and unlabeled examples, where the unlabeled examples could be either positive or negative. The state-of-the-art algorithms usually cast PU learning as a cost-sensitive learning problem and impose distinct weights to different training examples via a manual or automatic way. However, such weight adjustment or estimation can be inaccurate and thus often lead to unsatisfactory performance. Therefore, this paper regards all unlabeled examples as negative, which means that some of the original positive data are mistakenly labeled as negative. By doing so, we convert PU learning into the risk minimization problem in the presence of false negative label noise, and propose a novel PU learning algorithm termed ?Loss Decomposition and Centroid Estimation? (LDCE). By decomposing the hinge loss function into two parts, we show that only the second part is influenced by label noise, of which the adverse effect can be reduced by estimating the centroid of negative examples. We intensively validate our approach on synthetic dataset, UCI benchmark datasets and real-world datasets, and the experimental results firmly demonstrate the effectiveness of our approach when compared with other state-of-the-art PU learning methodologies.
Keywords:
Machine Learning: Classification
Machine Learning: Machine Learning