TY - GEN
T1 - Combining program analysis and statistical language model for code statement completion
AU - Nguyen, Son
AU - Nguyen, Tien
AU - Li, Yi
AU - Wang, Shaohua
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via code statements, which are considered as the basic units of program execution. In this paper, we introduce AutoSC, which combines program analysis and the principle of software naturalness to fill in a partially completed statement. AutoSC benefits from the strengths of both directions, in which the completed code statement is both frequent and valid. AutoSC is first trained on a large code corpus to derive the templates of candidate statements. Then, it uses program analysis to validate and concretize the templates into syntactically and type-valid candidate statements. Finally, these candidates are ranked by using a language model trained on the lexical form of the source code in the code corpus. Our empirical evaluation on the large datasets of real-world projects shows that AutoSC achieves 38.9-41.3% top-1 accuracy and 48.2-50.1% top-5 accuracy in statement completion. It also outperforms a state-of-the-art approach from 9X-69X in top-1 accuracy.
AB - Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via code statements, which are considered as the basic units of program execution. In this paper, we introduce AutoSC, which combines program analysis and the principle of software naturalness to fill in a partially completed statement. AutoSC benefits from the strengths of both directions, in which the completed code statement is both frequent and valid. AutoSC is first trained on a large code corpus to derive the templates of candidate statements. Then, it uses program analysis to validate and concretize the templates into syntactically and type-valid candidate statements. Finally, these candidates are ranked by using a language model trained on the lexical form of the source code in the code corpus. Our empirical evaluation on the large datasets of real-world projects shows that AutoSC achieves 38.9-41.3% top-1 accuracy and 48.2-50.1% top-5 accuracy in statement completion. It also outperforms a state-of-the-art approach from 9X-69X in top-1 accuracy.
KW - Code Completion
KW - Program Analysis
KW - Statement Completion
KW - Statistical Language Model
UR - http://www.scopus.com/inward/record.url?scp=85078944075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078944075&partnerID=8YFLogxK
U2 - 10.1109/ASE.2019.00072
DO - 10.1109/ASE.2019.00072
M3 - Conference contribution
AN - SCOPUS:85078944075
T3 - Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
SP - 710
EP - 721
BT - Proceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
Y2 - 10 November 2019 through 15 November 2019
ER -