Lightweight Random Indexing for Polylingual Text Classification (Extended Abstract)

Lightweight Random Indexing for Polylingual Text Classification (Extended Abstract)

Alejandro Moreo Fernández, Andrea Esuli, Fabrizio Sebastiani

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
Journal track. Pages 5642-5646. https://doi.org/10.24963/ijcai.2018/801

Polylingual Text Classification (PLC) is a supervised learning task that consists of assigning class labels to documents written in different languages, assuming that a representative set of training documents is available for each language. This scenario is more and more frequent, given the large quantity of multilingual platforms and communities emerging on the Internet. In this work we analyse some important methods proposed in the literature that are machine-translation-free and dictionary-free, and we propose a particular configuration of the Random Indexing method (that we dub Lightweight Random Indexing). We show that it outperforms all compared algorithms and also displays a significantly reduced computational cost.
Keywords:
Natural Language Processing: Text Classification
Machine Learning: Multi-instance;Multi-label;Multi-view learning