VF-Detector: Making Multi-Granularity Code Changes on Vulnerability Fix Detector Robust to Mislabeled Changes

VF-Detector: Making Multi-Granularity Code Changes on Vulnerability Fix Detector Robust to Mislabeled Changes

Zhenkan Fu, Shikai Guo, Hui Li, Rong Chen, Xiaochen Li, He Jiang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 5817-5825. https://doi.org/10.24963/ijcai.2024/643

As software development projects increasingly rely on open-source software, users face the risk of security vulnerabilities from third-party libraries. To address label and character noise in code changes, we present VF-Detector to automatically identifying bug-fix commits in actual noise development environment. VF-Detector consists of three componments: Data Pre-processing (DP), Vulnerability Confidence Computation (VCC) and Confidence Learning Denoising (CLD). The DP component is responsible for preprocessing code change data. The VCC component calculates code change confidence value for each bug-fix by extracting features at various granularity levels. The CLD component removes noise and enhances model robustness by pruning noisy data with confidence values and performing effort-aware adjustments. Experimental results demonstrate VF-Detector's superiority over state-of-the-art methods in EffortCost@L and Popt@L metrics on Java and Python datasets. The improvements were 6.5% and 5% for Java, and 23.4% and 17.8% for Python.
Keywords:
Multidisciplinary Topics and Applications: MTA: Software engineering
Agent-based and Multi-agent Systems: MAS: Trust and reputation
Data Mining: DM: Applications
Machine Learning: ML: Applications