Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation
Dialogue Cross-Enhanced Central Engagement Attention Model for Real-Time Engagement Estimation
Jun Yu, Keda Lu, Ji Zhao, Zhihong Wei, Iek-Heng Chu, Peng Chang
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 3187-3195.
https://doi.org/10.24963/ijcai.2024/353
Real-time engagement estimation has been an important research topic in human-computer interaction in recent years. The emergence of the NOvice eXpert Interaction (NOXI) dataset, enriched with frame-wise engagement annotations, has catalyzed a surge in research efforts in this domain. Existing feature sequence partitioning methods for ultra-long videos have encountered challenges including insufficient information utilization and repetitive inference. Moreover, those studies focus mainly on the target participants’ features without taking into account those of the interlocutor. To address these issues, we propose the center-based sliding window method to obtain feature subsequences. The core of these subsequences is modeled using our innovative Central Engagement Attention Model (CEAM). Additionally, we introduce the dialogue cross-enhanced module that effectively incorporates the interlocutor’s features via cross-attention. Our proposed method outperforms the current best model, achieving a substantial gain of 1.5% in coordination correlation coefficient (CCC) and establishing a new state-of-the-art result. Our source codes and model checkpoints are available at https://github.com/wujiekd/Dialogue-Cross-Enhanced-CEAM.
Keywords:
Humans and AI: HAI: Human-computer interaction
Humans and AI: HAI: Computer-aided education
Humans and AI: HAI: Personalization and user modeling