M2Beats: When Motion Meets Beats in Short-form Videos
M2Beats: When Motion Meets Beats in Short-form Videos
Dongxiang Jiang, Yongchang Zhang, Shuai He, Anlong Ming
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 920-928.
https://doi.org/10.24963/ijcai.2024/102
In recent years, short-form videos have gained popularity and the editing of these videos, particularly when motion is synchronized with music, is highly favored due to its beat-matching effect. However, detecting motion rhythm poses a significant challenge as it is influenced by multiple factors that make it difficult to define using explicit rules. While traditional methods attempt to define motion rhythm, they often yield unsatisfactory results. On the other hand, learning-based methods can extract motion rhythm without relying on explicit rules but require high-quality datasets. Unfortunately, existing datasets simply substitute music rhythm for motion rhythm which are not equivalent. To address these challenges, we present the motion rhythm dataset AIST-M2B, which is annotated with meticulously curated motion rhythm labels derived from the profound correlation between motion and music in professional dance. We propose a novel network architecture called M2BNet that is specifically trained on AIST-M2B to effectively extract intricate motion rhythms by incorporating both human body structure and temporal information. Additionally, we introduce a pioneering algorithm for enhancing motion rhythm synchronization with beats. Experimental results substan- tiate the superior performance of our method compared to other existing algorithms in the domain of motion rhythm analysis. Our code is available at https://github.com/mRobotit/M2Beats.
Keywords:
Computer Vision: CV: Image and video synthesis and generation
Computer Vision: CV: Image and video retrieval
Computer Vision: CV: Motion and tracking