TY - JOUR
T1 - Identifying challenges and opportunities of in-memory computing on large HPC systems
AU - Huang, Dan
AU - Qin, Zhenlu
AU - Liu, Qing
AU - Podhorszki, Norbert
AU - Klasky, Scott
N1 - Funding Information:
The authors wish to acknowledge the support from the US NSF under Grant No. CCF-1718297 , CCF-1812861 , CCF-2134202 , and NJIT research startup fund. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract DE-AC05-00OR22725.
Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2022/6
Y1 - 2022/6
N2 - With the increasing fidelity and resolution enabled by high-performance computing systems, simulation-based scientific discovery is able to model and understand microscopic physical phenomena at a level that was not possible in the past. A grand challenge that the HPC community facing is how to maintain the large amounts of analysis data generated from simulations. In-memory computing, among others, is recognized to be a viable path forward and has experienced tremendous success in the past decade. Nevertheless, there has been a lack of a complete study and understanding of in-memory computing as a whole on HPC systems. Given the enlarging disparity between compute and HPC storage I/O, it is urgent for the HPC community to assess the state of in-memory computing and understand the challenges and opportunities. This paper presents a comprehensive study of in-memory computing with regard to its software evolution, performance, usability, robustness, and portability. In particular, we conduct an indepth analysis on the evolution of in-memory computing based upon more than 3,000 commits, and use realistic workflows for two scientific workloads, i.e., LAMMPS and Laplace to quantitatively assess state-of-the-art in-memory computing libraries, including DataSpaces, DIMES, Flexpath, Decaf and SENSEI on two leading supercomputers, Titan and Cori. Our studies not only illustrate the performance and scalability, but also reveal the key aspects that are of interest to library developers and users, including usability, robustness, portability, potential design defects, etc.
AB - With the increasing fidelity and resolution enabled by high-performance computing systems, simulation-based scientific discovery is able to model and understand microscopic physical phenomena at a level that was not possible in the past. A grand challenge that the HPC community facing is how to maintain the large amounts of analysis data generated from simulations. In-memory computing, among others, is recognized to be a viable path forward and has experienced tremendous success in the past decade. Nevertheless, there has been a lack of a complete study and understanding of in-memory computing as a whole on HPC systems. Given the enlarging disparity between compute and HPC storage I/O, it is urgent for the HPC community to assess the state of in-memory computing and understand the challenges and opportunities. This paper presents a comprehensive study of in-memory computing with regard to its software evolution, performance, usability, robustness, and portability. In particular, we conduct an indepth analysis on the evolution of in-memory computing based upon more than 3,000 commits, and use realistic workflows for two scientific workloads, i.e., LAMMPS and Laplace to quantitatively assess state-of-the-art in-memory computing libraries, including DataSpaces, DIMES, Flexpath, Decaf and SENSEI on two leading supercomputers, Titan and Cori. Our studies not only illustrate the performance and scalability, but also reveal the key aspects that are of interest to library developers and users, including usability, robustness, portability, potential design defects, etc.
KW - Data analytics
KW - High-performance computing
KW - In-memory computing
KW - Workflow
UR - http://www.scopus.com/inward/record.url?scp=85125870800&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125870800&partnerID=8YFLogxK
U2 - 10.1016/j.jpdc.2022.02.002
DO - 10.1016/j.jpdc.2022.02.002
M3 - Article
AN - SCOPUS:85125870800
SN - 0743-7315
VL - 164
SP - 106
EP - 122
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
ER -