Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code

Enhancing the Unified Features to Locate Buggy Files by Exploiting the Sequential Nature of Source Code

Xuan Huo, Ming Li

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 1909-1915. https://doi.org/10.24963/ijcai.2017/265

Bug reports provide an effective way for end-users to disclose potential bugs hidden in a software system, while automatically locating the potential buggy source files according to a bug report remains a great challenge in software maintenance. Many previous approaches represent bug reports and source code from lexical and structural information correlated their relevance by measuring their similarity, and recently a CNN-based model is proposed to learn the unified features for bug localization, which overcomes the difficulty in modeling natural and programming languages with different structural semantics. However, previous studies fail to capture the sequential nature of source code, which carries additional semantics beyond the lexical and structural terms and such information is vital in modeling program functionalities and behaviors. In this paper, we propose a novel model LS-CNN, which enhances the unified features by exploiting the sequential nature of source code. LS-CNN combines CNN and LSTM to extract semantic features for automatically identifying potential buggy source code according to a bug report. Experimental results on widely-used software projects indicate that LS-CNN significantly outperforms the state-of-the-art methods in locating buggy files.
Keywords:
Machine Learning: Data Mining
Machine Learning: Machine Learning
Multidisciplinary Topics and Applications: Knowledge-based Software Engineering