Adapting Meta Knowledge with Heterogeneous Information Network for COVID-19 Themed Malicious Repository Detection

Adapting Meta Knowledge with Heterogeneous Information Network for COVID-19 Themed Malicious Repository Detection

Yiyue Qian, Yiming Zhang, Yanfang Ye, Chuxu Zhang

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence
Main Track. Pages 3684-3690. https://doi.org/10.24963/ijcai.2021/507

As cyberattacks caused by malware have proliferated during the pandemic, building an automatic system to detect COVID-19 themed malware in social coding platforms is in urgent need. The existing methods mainly rely on file content analysis while ignoring structured information among entities in social coding platforms. Additionally, they usually require sufficient data for model training, impairing their performances over cases with limited data which is common in reality. To address these challenges, we develop Meta-AHIN, a novel model for COVID-19 themed malicious repository detection in GitHub. In Meta-AHIN, we first construct an attributed heterogeneous information network (AHIN) to model the code content and social coding properties in GitHub; and then we exploit attention-based graph convolutional neural network (AGCN) to learn repository embeddings and present a meta-learning framework for model optimization. To utilize unlabeled information in AHIN and to consider task influence of different types of repositories, we further incorporate node attribute-based self-supervised module and task-aware attention weight into AGCN and meta-learning respectively. Extensive experiments on the collected data from GitHub demonstrate that Meta-AHIN outperforms state-of-the-art methods.
Keywords:
Multidisciplinary Topics and Applications: Security and Privacy
Data Mining: Classification
Data Mining: Mining Graphs, Semi Structured Data, Complex Data