MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
AI, Arts & Creativity. Pages 7771-7779. https://doi.org/10.24963/ijcai.2024/860

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a large-scale, private dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of CaiMD for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music.
Keywords:
Application domains: Music and sound
Methods and resources: AI methods for better understanding human creative processes
Methods and resources: AI systems for collaboration and co-creation
Methods and resources: Computational implementations inspired by fields such as psychology or cognitive science
Methods and resources: Datasets, knowledge bases and ontologies
Theory and philosophy of arts and creativity in AI systems: Autonomous creative or artistic AI
Theory and philosophy of arts and creativity in AI systems: Cultural and social impacts of AI on creativity, creative practice, education and society
Theory and philosophy of arts and creativity in AI systems: Ethical issues raised by creative AI systems
Theory and philosophy of arts and creativity in AI systems: Evaluation and curation of artistic or creative artefacts
Theory and philosophy of arts and creativity in AI systems: Other theory or philosophy of arts and creativity in AI
Theory and philosophy of arts and creativity in AI systems: Social (multi-agent) creativity and human-computer co-creation