Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges

Data Valuation in Machine Learning: "Ingredients", Strategies, and Open Challenges

Rachael Hwee Ling Sim, Xinyi Xu, Bryan Kian Hsiang Low

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Survey Track. Pages 5607-5614. https://doi.org/10.24963/ijcai.2022/782

Data valuation in machine learning (ML) is an emerging research area that studies the worth of data in ML. Data valuation is used in collaborative ML to determine a fair compensation for every data owner and in interpretable ML to identify the most responsible, noisy, or misleading training examples. This paper presents a comprehensive technical survey that provides a new formal study of data valuation in ML through its “ingredients” and the corresponding properties, grounds the discussion of common desiderata satisfied by existing data valuation strategies on our proposed ingredients, and identifies open research challenges for designing new ingredients, data valuation strategies, and cost reduction techniques.
Keywords:
Survey Track: Machine Learning
Survey Track: AI Ethics, Trust, Fairness
Survey Track: Multidisciplinary Topics and Applications