MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Zihao Wang; Shuyu Li; Tao Zhang; Qi Wang; Pengfei Yu; Jinyang Luo; Yan Liu; Ming Xi; Kejun Zhang

doi:10.24963/ijcai.2024/860

MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

AI, Arts & Creativity. Pages 7771-7779. https://doi.org/10.24963/ijcai.2024/860

PDF BibTeX

The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music description datasets cannot serve as benchmarks. To this end, we present MuChin, the first open-source music description benchmark in Chinese colloquial language, designed to evaluate the performance of multimodal LLMs in understanding and describing music. We established the Caichong Music Annotation Platform (CaiMAP) that employs an innovative multi-person, multi-stage assurance method, and recruited both amateurs and professionals to ensure the precision of annotations and alignment with popular semantics. Utilizing this method, we built a large-scale, private dataset with multi-dimensional, high-precision music annotations, the Caichong Music Dataset (CaiMD), and carefully selected 1,000 high-quality entries to serve as the test set for MuChin. Based on MuChin, we analyzed the discrepancies between professionals and amateurs in terms of music description, and empirically demonstrated the effectiveness of CaiMD for fine-tuning LLMs. Ultimately, we employed MuChin to evaluate existing music understanding models on their ability to provide colloquial descriptions of music.

Keywords:

Application domains: Music and sound

Methods and resources: AI methods for better understanding human creative processes

Methods and resources: AI systems for collaboration and co-creation

Methods and resources: Computational implementations inspired by fields such as psychology or cognitive science

Methods and resources: Datasets, knowledge bases and ontologies

Theory and philosophy of arts and creativity in AI systems: Autonomous creative or artistic AI

Theory and philosophy of arts and creativity in AI systems: Cultural and social impacts of AI on creativity, creative practice, education and society

Theory and philosophy of arts and creativity in AI systems: Ethical issues raised by creative AI systems

Theory and philosophy of arts and creativity in AI systems: Evaluation and curation of artistic or creative artefacts

Theory and philosophy of arts and creativity in AI systems: Other theory or philosophy of arts and creativity in AI

Theory and philosophy of arts and creativity in AI systems: Social (multi-agent) creativity and human-computer co-creation