TY - GEN
T1 - UTANGO
T2 - 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022
AU - Li, Yi
AU - Wang, Shaohua
AU - Nguyen, Tien N.
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/11/7
Y1 - 2022/11/7
N2 - During software evolution, developers make several changes and commit them into the repositories. Unfortunately, many of them tangle different purposes, both hampering program comprehension and reducing separation of concerns. Automated approaches with deterministic solutions have been proposed to untangle commits. However, specifying an effective clustering criteria on the changes in a commit for untangling is challenging for those approaches. In this work, we present UTango, a machine learning (ML)-based approach that learns to untangle the changes in a commit. We develop a novel code change clustering learning model that learns to cluster the code changes, represented by the embeddings, into different groups with different concerns. We adapt the agglomerative clustering algorithm into a supervised-learning clustering model operating on the learned code change embeddings via trainable parameters and a loss function in comparing the predicted clusters and the correct ones during training. To facilitate our clustering learning model, we develop a context-aware, graph-based, code change representation learning model, leveraging Label, Graph-based Convolution Network to produce the contextualized embeddings for code changes, that integrates program dependencies and the surrounding contexts of the changes. The contexts and cloned code are also explicitly represented, helping UTango distinguish the concerns. Our empirical evaluation on C# and Java datasets with 1,612 and 14k tangled commits show that it achieves the accuracy of 28.6%- 462.5% and 13.3%-100.0% relatively higher than the state-of-the-art commit-untangling approaches for C# and Java, respectively.
AB - During software evolution, developers make several changes and commit them into the repositories. Unfortunately, many of them tangle different purposes, both hampering program comprehension and reducing separation of concerns. Automated approaches with deterministic solutions have been proposed to untangle commits. However, specifying an effective clustering criteria on the changes in a commit for untangling is challenging for those approaches. In this work, we present UTango, a machine learning (ML)-based approach that learns to untangle the changes in a commit. We develop a novel code change clustering learning model that learns to cluster the code changes, represented by the embeddings, into different groups with different concerns. We adapt the agglomerative clustering algorithm into a supervised-learning clustering model operating on the learned code change embeddings via trainable parameters and a loss function in comparing the predicted clusters and the correct ones during training. To facilitate our clustering learning model, we develop a context-aware, graph-based, code change representation learning model, leveraging Label, Graph-based Convolution Network to produce the contextualized embeddings for code changes, that integrates program dependencies and the surrounding contexts of the changes. The contexts and cloned code are also explicitly represented, helping UTango distinguish the concerns. Our empirical evaluation on C# and Java datasets with 1,612 and 14k tangled commits show that it achieves the accuracy of 28.6%- 462.5% and 13.3%-100.0% relatively higher than the state-of-the-art commit-untangling approaches for C# and Java, respectively.
KW - Code Change Embeddings
KW - Commit Untangling
KW - Deep Learning
UR - http://www.scopus.com/inward/record.url?scp=85143064840&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143064840&partnerID=8YFLogxK
U2 - 10.1145/3540250.3549171
DO - 10.1145/3540250.3549171
M3 - Conference contribution
AN - SCOPUS:85143064840
T3 - ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 221
EP - 232
BT - ESEC/FSE 2022 - Proceedings of the 30th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Roychoudhury, Abhik
A2 - Cadar, Cristian
A2 - Kim, Miryung
PB - Association for Computing Machinery, Inc
Y2 - 14 November 2022 through 18 November 2022
ER -