DeMinify: Neural Variable Name Recovery and Type Inference

Yi Li, Aashish Yadavally, Jiaxing Zhang, Shaohua Wang, Tien N. Nguyen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To avoid the exposure of original source code, the variable names deployed in the wild are often replaced by short, meaningless names, thus making the code difficult to understand and be analyzed. We introduce DeMinify, a Deep-Learning (DL)-based approach that formulates such recovery problem as the prediction of missing features in a Graph Convolutional Network-Missing Features. The graph represents both the relations among the variables and the relations among their types, in which the names or types of some nodes are missing. Moreover, DeMinify leverages dual-task learning to propagate the mutual impact between the learning of the variable names and that of their types. We conducted experiments to evaluate DeMinify in both name recovery and type prediction on a Python dataset with 180k methods and a JavaScript (JS) dataset with 322k files. For variable name prediction, in 76.7% and 81.6% of the cases in Python and JS code respectively, DeMinify can predict correctly the variables' names with a single suggested name. DeMinify relatively improves 15.3%-40.7% and 7.7%-49.7% in top-1 accuracy over the state-of-the-art variable name recovery approaches for Python and JS code, respectively. It also relatively improves 14.5%-51.9% in top-1 accuracy over the existing type prediction approaches. Our experimental results showed that learning of data types helps improve variable name recovery and vice versa.

Original languageEnglish (US)
Title of host publicationESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
EditorsSatish Chandra, Kelly Blincoe, Paolo Tonella
PublisherAssociation for Computing Machinery, Inc
Pages758-770
Number of pages13
ISBN (Electronic)9798400703270
DOIs
StatePublished - Nov 30 2023
Event31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 - San Francisco, United States
Duration: Dec 3 2023Dec 9 2023

Publication series

NameESEC/FSE 2023 - Proceedings of the 31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Conference

Conference31st ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023
Country/TerritoryUnited States
CitySan Francisco
Period12/3/2312/9/23

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software

Keywords

  • Deep Learning
  • Minified Code
  • Name Recovery
  • Type Inference

Fingerprint

Dive into the research topics of 'DeMinify: Neural Variable Name Recovery and Type Inference'. Together they form a unique fingerprint.

Cite this