TY - GEN
T1 - (Partial) Program Dependence Learning
AU - Yadavally, Aashish
AU - Nguyen, Tien N.
AU - Wang, Wenbo
AU - Wang, Shaohua
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%-97.17% and 92.46%-96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG*) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG* is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.
AB - Code fragments from developer forums often migrate to applications due to the code reuse practice. Owing to the incomplete nature of such programs, analyzing them to early determine the presence of potential vulnerabilities is challenging. In this work, we introduce NeuralPDA, a neural network-based program dependence analysis tool for both complete and partial programs. Our tool efficiently incorporates intra-statement and inter-statement contextual features into statement representations, thereby modeling program dependence analysis as a statement-pair dependence decoding task. In the empirical evaluation, we report that NeuralPDA predicts the CFG and PDG edges in complete Java and C/C++ code with combined F-scores of 94.29% and 92.46%, respectively. The F-score values for partial Java and C/C++ code range from 94.29%-97.17% and 92.46%-96.01%, respectively. We also test the usefulness of the PDGs predicted by NeuralPDA (i.e., PDG*) on the downstream task of method-level vulnerability detection. We discover that the performance of the vulnerability detection tool utilizing PDG* is only 1.1% less than that utilizing the PDGs generated by a program analysis tool. We also report the detection of 14 real-world vulnerable code snippets from StackOverflow by a machine learning-based vulnerability detection tool that employs the PDGs predicted by NeuralPDA for these code snippets.
KW - deep learning
KW - neural networks
KW - neural partial program analysis
KW - neural program dependence analysis
UR - http://www.scopus.com/inward/record.url?scp=85171736652&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85171736652&partnerID=8YFLogxK
U2 - 10.1109/ICSE48619.2023.00209
DO - 10.1109/ICSE48619.2023.00209
M3 - Conference contribution
AN - SCOPUS:85171736652
T3 - Proceedings - International Conference on Software Engineering
SP - 2501
EP - 2513
BT - Proceedings - 2023 IEEE/ACM 45th International Conference on Software Engineering, ICSE 2023
PB - IEEE Computer Society
T2 - 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023
Y2 - 15 May 2023 through 16 May 2023
ER -