DifTraj: Diffusion Inspired by Intrinsic Intention and Extrinsic Interaction for Multi-Modal Trajectory Prediction
DifTraj: Diffusion Inspired by Intrinsic Intention and Extrinsic Interaction for Multi-Modal Trajectory Prediction
Yanghong Liu, Xingping Dong, Yutian Lin, Mang Ye
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 1128-1136.
https://doi.org/10.24963/ijcai.2024/125
Recent years have witnessed the success of generative adversarial networks and diffusion models in multi-model trajectory prediction. However, prevailing algorithms only explicitly consider human interaction, but ignore the modeling of human intention, yielding that the generated results deviate largely from real trajectories in some complex scenes. In this paper, we analyze the conditions of multi-modal trajectory prediction from two objective perspectives and propose a novel end-to-end framework based on the diffusion model to predict more precise and socially-acceptable trajectories for humans. Firstly, a spatial-temporal aggregation module is built to extract the extrinsic interaction features for capturing socially-acceptable behaviors. Secondly, we explicitly construct the intrinsic intention module to obtain intention features for precise prediction. Finally, we estimate a noise trajectory distribution with these two features as the initiation of diffusion model and leverage denoising process to obtain the final trajectories. Furthermore, to reduce the noise of the initiative trajectory estimation, we present a novel sample consistency loss to constrain multiple predictions. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods on ETH-UCY and SDD benchmarks, specifically achieving 19.0%/24.2% ADE/FDE improvement on ETH-UCY.
Keywords:
Computer Vision: CV: Motion and tracking
Agent-based and Multi-agent Systems: MAS: Multi-agent learning
Machine Learning: ML: Time series and data streams