TY - JOUR
T1 - Harnessing Data Movement in Virtual Clusters for In-Situ Execution
AU - Huang, Dan
AU - Liu, Qing
AU - Klasky, Scott
AU - Wang, Jun
AU - Choi, Jong Youl
AU - Logan, Jeremy
AU - Podhorszki, Norbert
N1 - Funding Information:
This work is supported in part by the US National Science Foundation Grant CCF-1718297, CCF-1527249, CCF-1337244, CCF-1717338, US Army/DURIP program W911NF-17-1-0208, and Department of Energy Advanced Scientific Computing Research. The experiments of this work are conducted on the PRObE Marmot cluster, which is supported in part by National Science Foundation awards CNS-1042537 and CNS-1042543 (PRObE).
Publisher Copyright:
© 1990-2012 IEEE.
PY - 2019/3/1
Y1 - 2019/3/1
N2 - As a result of increasing data volume and velocity, Big Data science at exascale has shifted towards the in-situ paradigm, where large scale simulations run concurrently alongside data analytics. With in-situ, data generated from simulations can be processed while still in memory, thereby avoiding the slow storage bottleneck. However, running simulations and analytics together on shared resources will likely result in substantial contention if left unmanaged, as demonstrated in this work, leading to much reduced efficiency of simulations and analytics. Recently, virtualization technologies such as Linux containers have been widely applied to data centers and physical clusters to provide highly efficient and elastic resource provisioning for consolidated workloads including scientific simulations and data analytics. In this paper, we investigate to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications in virtual clusters. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Although this can be an effective technique, the naïve usage of network virtualization can lead to performance degradation for bursty asynchronous transmissions within an MPI job. We analyze and resolve this performance degradation in virtual clusters.
AB - As a result of increasing data volume and velocity, Big Data science at exascale has shifted towards the in-situ paradigm, where large scale simulations run concurrently alongside data analytics. With in-situ, data generated from simulations can be processed while still in memory, thereby avoiding the slow storage bottleneck. However, running simulations and analytics together on shared resources will likely result in substantial contention if left unmanaged, as demonstrated in this work, leading to much reduced efficiency of simulations and analytics. Recently, virtualization technologies such as Linux containers have been widely applied to data centers and physical clusters to provide highly efficient and elastic resource provisioning for consolidated workloads including scientific simulations and data analytics. In this paper, we investigate to facilitate network traffic manipulation and reduce mutual interference on the network for in-situ applications in virtual clusters. In order to dynamically allocate the network bandwidth when it is needed, we adopt SARIMA-based techniques to analyze and predict MPI traffic issued from simulations. Although this can be an effective technique, the naïve usage of network virtualization can lead to performance degradation for bursty asynchronous transmissions within an MPI job. We analyze and resolve this performance degradation in virtual clusters.
KW - ARIMA
KW - In-situ applications
KW - MPI
KW - collective communication
KW - virtual network
KW - virtual switch
UR - http://www.scopus.com/inward/record.url?scp=85052620422&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052620422&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2018.2867879
DO - 10.1109/TPDS.2018.2867879
M3 - Article
AN - SCOPUS:85052620422
SN - 1045-9219
VL - 30
SP - 615
EP - 629
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 3
M1 - 8451897
ER -