Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss
Learning from Long-Tailed Noisy Data with Sample Selection and Balanced Loss
Lefan Zhang, Zhang-Hao Tian, Wujun Zhou, Wei Wang
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 5471-5480.
https://doi.org/10.24963/ijcai.2024/605
The success of deep learning depends on large-scale and well-curated training data, while data in real-world applications are commonly long-tailed and noisy. Existing methods are usually dependent on label frequency to tackle class imbalance, while the model bias on different classes is not directly related to label frequency and the true label frequency is inaccessible under label noise. To solve this, we propose a robust method for learning from long-tailed noisy data with sample selection and balanced loss. Specifically, we separate the noisy training data into clean labeled set and unlabeled set with sample selection, and train the deep neural network in a semi-supervised manner with a balanced loss based on model bias. Extensive experiments on benchmarks demonstrate that our method outperforms existing state-of-the-art methods.
Keywords:
Machine Learning: ML: Classification