The Information Retrieval Experiment Platform (Extended Abstract)

The Information Retrieval Experiment Platform (Extended Abstract)

Maik Fröbe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Sister Conferences Best Papers. Pages 8405-8410. https://doi.org/10.24963/ijcai.2024/931

We have built TIREx, the information retrieval experiment platform, to promote standardized, reproducible, scalable, and blinded retrieval experiments. Standardization is achieved through integration with PyTerrier's interfaces and compatibility with ir_datasets and ir_measures. Reproducibility and scalability are based on the underlying TIRA framework, which runs dockerized software in a cloud-native execution environment. Using Docker images of 50 standard retrieval approaches, we evaluated all of them on 32 tasks (i.e., 1,600 runs) in less than a week on a midsize cluster (1,620 CPU cores and 24 GPUs), demonstrating multi-task scalability. Importantly, TIRA also enables blind evaluation of AI experiments, as the test data can be hidden from public access and the tested approaches run in a sandbox that prevents data leaks. Keeping the test data hidden from public access ensures that it cannot be used by third parties for LLM training, preventing future training-test leaks.
Keywords:
Natural Language Processing: NLP: Information retrieval and text mining
Natural Language Processing: NLP: Resources and evaluation
Natural Language Processing: NLP: Tools