TY - GEN
T1 - Performance optimization of budget-constrained mapreduce workflows in multi-clouds
AU - Cao, Huiyan
AU - Wu, Chase Q.
N1 - Funding Information:
It is of our future interest to refine and generalize the mathematical models to achieve a higher level of accuracy for workflow execution time measurement in real-world cloud environments. For example, the actual execution time of different programs on different types of VMs is dependent on many factors such as program structures and machine configurations. Particularly, when provisioning multiple VMs on the same physical server, the performances of those VMs are correlated and constrained by the physical machine. Moreover, in real networks, physical servers may fail under a certain probability and the actual workload of workflow modules may be subject to dynamic changes. We will consider and address such practical issues in our future work. ACKNOWLEDGEMENT This research is sponsored by U.S. National Science Foundation under Grant No. CNS-1560698 with New Jersey Institute of Technology.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/13
Y1 - 2018/7/13
N2 - With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.
AB - With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.
KW - Cloud computing
KW - MapReduce
KW - Performance optimization
KW - Scientific workflows
KW - Workflow mapping
UR - http://www.scopus.com/inward/record.url?scp=85050964560&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050964560&partnerID=8YFLogxK
U2 - 10.1109/CCGRID.2018.00039
DO - 10.1109/CCGRID.2018.00039
M3 - Conference contribution
AN - SCOPUS:85050964560
T3 - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
SP - 243
EP - 252
BT - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
Y2 - 1 May 2018 through 4 May 2018
ER -