Combining program analysis and statistical language model for code statement completion

Son Nguyen, Tien Nguyen, Yi Li, Shaohua Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

21 Scopus citations

Abstract

Automatic code completion helps improve developers' productivity in their programming tasks. A program contains instructions expressed via code statements, which are considered as the basic units of program execution. In this paper, we introduce AutoSC, which combines program analysis and the principle of software naturalness to fill in a partially completed statement. AutoSC benefits from the strengths of both directions, in which the completed code statement is both frequent and valid. AutoSC is first trained on a large code corpus to derive the templates of candidate statements. Then, it uses program analysis to validate and concretize the templates into syntactically and type-valid candidate statements. Finally, these candidates are ranked by using a language model trained on the lexical form of the source code in the code corpus. Our empirical evaluation on the large datasets of real-world projects shows that AutoSC achieves 38.9-41.3% top-1 accuracy and 48.2-50.1% top-5 accuracy in statement completion. It also outperforms a state-of-the-art approach from 9X-69X in top-1 accuracy.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages710-721
Number of pages12
ISBN (Electronic)9781728125084
DOIs
StatePublished - Nov 2019
Event34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019 - San Diego, United States
Duration: Nov 10 2019Nov 15 2019

Publication series

NameProceedings - 2019 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019

Conference

Conference34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019
Country/TerritoryUnited States
CitySan Diego
Period11/10/1911/15/19

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Software
  • Control and Optimization

Keywords

  • Code Completion
  • Program Analysis
  • Statement Completion
  • Statistical Language Model

Fingerprint

Dive into the research topics of 'Combining program analysis and statistical language model for code statement completion'. Together they form a unique fingerprint.

Cite this