Abstract
Opportunities or Risks to Reduce Labor in Crowdsourcing Translation? Characterizing Cost versus Quality via a PageRank-HITS Hybrid Model / 1025
Rui Yan, Yiping Song, Cheng-Te Li, Ming Zhang, Xiaohua Hu
PDF
Crowdsourcing machine translation shows advantages of lower expense in money to collect the translated data. Yet, when compared with translation by trained professionals, results collected from non-professional translators might yield low-quality outputs. A general solution for crowdsourcing practitioners is to employ a large amount of labor force to gather enough redundant data and then solicit from it. Actually we can further save money by avoid collecting bad translations. We propose to score Turkers by their authorities during observation, and then stop hiring the unqualified Turkers. In this way, we bring both opportunities and risks in crowdsourced translation: we can make it cheaper than cheaper while we might suffer from quality loss. In this paper, we propose a graph-based PageRank-HITS Hybrid model to distinguish authoritative workers from unreliable ones. The algorithm captures the intuition that good translation and good workers are mutually reinforced iteratively in the proposed frame. We demonstrate the algorithm will keep the performance while reduce work force and hence cut cost. We run experiments on the NIST 2009 Urdu-to-English evaluation set with Mechanical Turk, and quantitatively evaluate the performance in terms of BLEU score, Pearson correlation and real money.