TY - JOUR
T1 - CM-CASL
T2 - Comparison-based performance modeling of software systems via collaborative active and semisupervised learning
AU - Cao, Rong
AU - Bao, Liang
AU - Wu, Chase
AU - Zhangsun, Panpan
AU - Li, Yufei
AU - Zhang, Zhe
N1 - Funding Information:
This work is supported by the National Natural Science Foundation of China [Grant No. 62172316 ]; the Ministry of Education Humanities and Social Science Project of China [Grant No. 17YJA790047 ]; the Soft Science Research Plans of Shaanxi Province [Grant No. 2020KRZ018 ]; the Research Project on Major Theoretical and Practical Problems of Philosophy and Social Sciences in Shaanxi Province [Grant No. 20JZ-25 ]; the Key R&D Program of Shaanxi [Grant No. 2019ZDLGY13-03-02 ]; the Natural Science Foundation of Shaanxi Province, China [Grant No. 2019JM-368 ]; and the Key R&D Program of Hebei [Grant No. 20310102D ].
Funding Information:
Chase Wu is a professor and the associate chair in Department of Data Science. His work has been supported by various funding agencies, including NSF, DOE, DHS, and ORNL, where he is a collaborative research staff. He has published about 300 research articles in highly reputed conference proceedings, journals, and books. His main research interests include Big data, machine learning, high-performance networking, parallel and distributed computing, sensor networks, scientific visualization, and cyber security.
Publisher Copyright:
© 2023 Elsevier Inc.
PY - 2023/7
Y1 - 2023/7
N2 - Configuration tuning for large software systems is generally challenging due to the complex configuration space and expensive performance evaluation. Most existing approaches follow a two-phase process, first learning a regression-based performance prediction model on available samples and then searching for the configurations with satisfactory performance using the learned model. Such regression-based models often suffer from the scarcity of samples due to the enormous time and resources required to run a large software system with a specific configuration. Moreover, previous studies have shown that even a highly accurate regression-based model may fail to discern the relative merit between two configurations, whereas performance comparison is actually one fundamental strategy for configuration tuning. To address these issues, this paper proposes CM-CASL, a Comparison-based performance Modeling approach for software systems via Collaborative Active and Semisupervised Learning. CM-CASL learns a classification model that compares the performance of two given configurations, and enhances the samples through a collaborative labeling process by both human experts and classifiers using an integration of active and semisupervised learning. Experimental results demonstrate that CM-CASL outperforms two state-of-the-art performance modeling approaches in terms of both classification accuracy and rank accuracy, and thus provides a better performance model for the subsequent work of configuration tuning.
AB - Configuration tuning for large software systems is generally challenging due to the complex configuration space and expensive performance evaluation. Most existing approaches follow a two-phase process, first learning a regression-based performance prediction model on available samples and then searching for the configurations with satisfactory performance using the learned model. Such regression-based models often suffer from the scarcity of samples due to the enormous time and resources required to run a large software system with a specific configuration. Moreover, previous studies have shown that even a highly accurate regression-based model may fail to discern the relative merit between two configurations, whereas performance comparison is actually one fundamental strategy for configuration tuning. To address these issues, this paper proposes CM-CASL, a Comparison-based performance Modeling approach for software systems via Collaborative Active and Semisupervised Learning. CM-CASL learns a classification model that compares the performance of two given configurations, and enhances the samples through a collaborative labeling process by both human experts and classifiers using an integration of active and semisupervised learning. Experimental results demonstrate that CM-CASL outperforms two state-of-the-art performance modeling approaches in terms of both classification accuracy and rank accuracy, and thus provides a better performance model for the subsequent work of configuration tuning.
KW - Active learning
KW - Comparison-based model
KW - Performance modeling
KW - Semisupervised learning
KW - Software systems
UR - http://www.scopus.com/inward/record.url?scp=85151789671&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85151789671&partnerID=8YFLogxK
U2 - 10.1016/j.jss.2023.111686
DO - 10.1016/j.jss.2023.111686
M3 - Article
AN - SCOPUS:85151789671
SN - 0164-1212
VL - 201
JO - Journal of Systems and Software
JF - Journal of Systems and Software
M1 - 111686
ER -