TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing

TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing

Yili Fang, Chaojie Shen, Huamao Gu, Tao Han, Xinyi Ding

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence
Main Track. Pages 2984-2992. https://doi.org/10.24963/ijcai.2023/333

In crowdsourcing, existing efforts mainly use real datasets collected from crowdsourcing as test datasets to evaluate the effectiveness of aggregation algorithms. However, these work ignore the fact that the datasets obtained by crowdsourcing are usually sparse and imbalanced due to limited budget. As a result, applying the same aggregation algorithm on different datasets often show contradicting conclusions. For example, on the RTE dataset, Dawid and Skene model performs significantly better than Majority Voting, while on the LableMe dataset, the experiments give the opposite conclusion. It is challenging to obtain comprehensive and balanced datasets at a low cost. To our best knowledge, little effort have been made to the fair evaluation of aggregation algorithms. To fill in this gap, we propose a novel method named TDG4Crowd that can automatically generate comprehensive and balanced datasets. Using Kullback Leibler divergence and Kolmogorov–Smirnov test, the experiment results show the superior of our method compared with others. Aggregation algorithms also perform more consistently on the synthetic datasets generated using our method.
Keywords:
Humans and AI: HAI: Human computation and crowdsourcing
Machine Learning: ML: Autoencoders
Machine Learning: ML: Cost-sensitive learning