TY - GEN
T1 - On performance modeling and prediction in support of scientific workflow optimization
AU - Wu, Qishi
AU - Datla, Vivek V.
PY - 2011
Y1 - 2011
N2 - The computing modules in distributed scientific workflows must be mapped to computer nodes in shared network environments for optimal workflow performance. Finding a good workflow mapping scheme critically depends on an accurate prediction of the execution time of each individual computational module in the workflow. The time prediction of a scientific computation does not have a silver bullet as it is determined collectively by several dynamic system factors including concurrent loads, memory size, CPU speed, and also by the complexity of the computational program itself. This paper investigates the problem of modeling scientific computations and predicting their execution time based on a combination of both hardware and software properties. We employ statistical learning techniques to estimate the effective computational power of a given computer node at any point of time and estimate the total number of CPU cycles needed for executing a given computational program on any input data size. We analytically derive an upper bound of the estimation error for execution time prediction given the hardware and software properties. The proposed statistical analysis-based solution to performance modeling and prediction is validated and justified by experimental results measured on the computing nodes that vary significantly in terms of the hardware specifications.
AB - The computing modules in distributed scientific workflows must be mapped to computer nodes in shared network environments for optimal workflow performance. Finding a good workflow mapping scheme critically depends on an accurate prediction of the execution time of each individual computational module in the workflow. The time prediction of a scientific computation does not have a silver bullet as it is determined collectively by several dynamic system factors including concurrent loads, memory size, CPU speed, and also by the complexity of the computational program itself. This paper investigates the problem of modeling scientific computations and predicting their execution time based on a combination of both hardware and software properties. We employ statistical learning techniques to estimate the effective computational power of a given computer node at any point of time and estimate the total number of CPU cycles needed for executing a given computational program on any input data size. We analytically derive an upper bound of the estimation error for execution time prediction given the hardware and software properties. The proposed statistical analysis-based solution to performance modeling and prediction is validated and justified by experimental results measured on the computing nodes that vary significantly in terms of the hardware specifications.
KW - Performance modeling
KW - Regression techniques
KW - Scientific computation
UR - http://www.scopus.com/inward/record.url?scp=80053407646&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053407646&partnerID=8YFLogxK
U2 - 10.1109/SERVICES.2011.37
DO - 10.1109/SERVICES.2011.37
M3 - Conference contribution
AN - SCOPUS:80053407646
SN - 9780769544618
T3 - Proceedings - 2011 IEEE World Congress on Services, SERVICES 2011
SP - 161
EP - 168
BT - Proceedings - 2011 IEEE World Congress on Services, SERVICES 2011
T2 - 2011 IEEE World Congress on Services, SERVICES 2011
Y2 - 4 July 2011 through 9 July 2011
ER -