Make Bricks with a Little Straw: Large-Scale Spatio-Temporal Graph Learning with Restricted GPU-Memory Capacity

Binwu Wang; Pengkun Wang; Zhengyang Zhou; Zhe Zhao; Wei Xu; Yang Wang

doi:10.24963/ijcai.2024/264

Make Bricks with a Little Straw: Large-Scale Spatio-Temporal Graph Learning with Restricted GPU-Memory Capacity

Binwu Wang, Pengkun Wang, Zhengyang Zhou, Zhe Zhao, Wei Xu, Yang Wang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Main Track. Pages 2388-2396. https://doi.org/10.24963/ijcai.2024/264

PDF BibTeX

Traffic prediction plays a key role in various smart city applications, which can help traffic managers make traffic plans in advance, assist online ride-hailing companies in deploying vehicles reasonably, and provide early warning of congestion for safety authorities. While increasingly complex models achieve impressive prediction performance, there are concerns about the effectiveness of these models in handling large-scale road networks. Especially for researchers who don't have access to powerful GPU devices, the expensive memory burden limits the usefulness of these models. In this paper, we take the first step of learning on the large-scale spatio-temporal graph and propose a divide-and-conquer training strategy for Large Spatio-Temporal Graph Learning, namely LarSTL. The core idea behind this strategy is to divide the large graph into multiple subgraphs, which are treated as task streams to sequentially train the model to conquer each subgraph one by one. We introduce a novel perspective based on the continuous learning paradigm to achieve this goal. In order to overcome forgetting the knowledge learned from previous subgraphs, an experience-replay strategy consolidates the learned knowledge by replaying nodes sampled from previous subgraphs. Moreover, we configure specific feature adaptors for each subgraph to extract personalized features, and it is also beneficial to consolidate the learned knowledge from the perspective of parameters. We conduct experiments using multiple large-scale traffic network datasets on a V100 GPU with only 16GB memory, and the results demonstrate that our LarSTL can achieve competitive performance and high efficiency.

Keywords:

Data Mining: DM: Mining spatial and/or temporal data

Data Mining: DM: Big data and scalability

Data Mining: DM: Mining graphs