Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges
Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges
Rachael Hwee Ling Sim, Xinyi Xu, Bryan Kian Hsiang Low
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Survey Track. Pages 5607-5614.
https://doi.org/10.24963/ijcai.2022/782
Data valuation in machine learning (ML) is an emerging research area that studies the worth of data in ML. Data valuation is used in collaborative ML to determine a fair compensation for every data owner and in interpretable ML to identify the most responsible, noisy, or misleading training examples. This paper presents a comprehensive technical survey that provides a new formal study of data valuation in ML through its “ingredients” and the corresponding properties, grounds the discussion of common desiderata satisfied by existing data valuation strategies on our proposed ingredients, and identifies open research challenges for designing new ingredients, data valuation strategies, and cost reduction techniques.
Keywords:
Survey Track: Machine Learning
Survey Track: AI Ethics, Trust, Fairness
Survey Track: Multidisciplinary Topics and Applications