TY - GEN
T1 - Dlfix
T2 - 42nd ACM/IEEE International Conference on Software Engineering, ICSE 2020
AU - Yi, Li
AU - Wang, Shaohua
AU - Nguyen, Tien N.
N1 - Publisher Copyright:
© 2020 Association for Computing Machinery.
PY - 2020/6/27
Y1 - 2020/6/27
N2 - Automated Program Repair (APR) is very useful in helping developers in the process of software development and maintenance. Despite recent advances in deep learning (DL), the DL-based APR approaches still have limitations in learning bug-fixing code changes and the context of the surrounding source code of the bug-fixing code changes. These limitations lead to incorrect fixing locations or fixes. In this paper, we introduce DLFix, a two-tier DL model that treats APR as code transformation learning from the prior bug fixes and the surrounding code contexts of the fixes. The first layer is a tree-based RNN model that learns the contexts of bug fixes and its result is used as an additional weighting input for the second layer designed to learn the bug-fixing code transformations. We conducted several experiments to evaluate DLFix in two benchmarks: Defect4j and Bugs.jar, and a newly built bug datasets with a total of +20K real-world bugs in eight projects. We compared DLFix against a total of 13 state-of-the-art pattern-based APR tools. Our results show that DLFix can auto-fix more bugs than 11 of them, and is comparable and complementary to the top two pattern-based APR tools in which there are 7 and 11 unique bugs that they cannot detect, respectively, but we can. Importantly, DLFix is fully automated and data-driven, and does not require hard-coding of bug-fixing patterns as in those tools. We compared DLFix against 4 state-of-the-art deep learning based APR models. DLFix is able to fix 2.5 times more bugs than the best performing baseline.
AB - Automated Program Repair (APR) is very useful in helping developers in the process of software development and maintenance. Despite recent advances in deep learning (DL), the DL-based APR approaches still have limitations in learning bug-fixing code changes and the context of the surrounding source code of the bug-fixing code changes. These limitations lead to incorrect fixing locations or fixes. In this paper, we introduce DLFix, a two-tier DL model that treats APR as code transformation learning from the prior bug fixes and the surrounding code contexts of the fixes. The first layer is a tree-based RNN model that learns the contexts of bug fixes and its result is used as an additional weighting input for the second layer designed to learn the bug-fixing code transformations. We conducted several experiments to evaluate DLFix in two benchmarks: Defect4j and Bugs.jar, and a newly built bug datasets with a total of +20K real-world bugs in eight projects. We compared DLFix against a total of 13 state-of-the-art pattern-based APR tools. Our results show that DLFix can auto-fix more bugs than 11 of them, and is comparable and complementary to the top two pattern-based APR tools in which there are 7 and 11 unique bugs that they cannot detect, respectively, but we can. Importantly, DLFix is fully automated and data-driven, and does not require hard-coding of bug-fixing patterns as in those tools. We compared DLFix against 4 state-of-the-art deep learning based APR models. DLFix is able to fix 2.5 times more bugs than the best performing baseline.
KW - Automated program repair
KW - Context-based code transformation learning
KW - Deep learning
UR - http://www.scopus.com/inward/record.url?scp=85094122130&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094122130&partnerID=8YFLogxK
U2 - 10.1145/3377811.3380345
DO - 10.1145/3377811.3380345
M3 - Conference contribution
AN - SCOPUS:85094122130
T3 - Proceedings - International Conference on Software Engineering
SP - 602
EP - 614
BT - Proceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering, ICSE 2020
PB - IEEE Computer Society
Y2 - 27 June 2020 through 19 July 2020
ER -