A comprehensive study of in-memory computing on large HPC systems

Dan Huang, Zhenlu Qin, Qing Liu, Norbert Podhorszki, Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

—With the increasing fidelity and resolution enabled by high-performance computing systems, simulation-based scientific discovery is able to model and understand microscopic physical phenomena at a level that was not possible in the past. A grand challenge that the HPC community is faced with is how to handle the large amounts of analysis data generated from simulations. In-memory computing, among others, is recognized to be a viable path forward and has experienced tremendous success in the past decade. Nevertheless, there has been a lack of a complete study and understanding of in-memory computing as a whole on HPC systems. This paper presents a comprehensive study, which goes well beyond the typical performance metrics. In particular, we assess the in-memory computing with regard to its usability, portability, robustness and internal design trade-offs, which are the key factors that of interest to domain scientists. We use two realistic scientific workflows, LAMMPS and Laplace, to conduct comprehensive studies on state-of-the-art in-memory computing libraries, including DataSpaces, DIMES, Flexpath and Decaf. We conduct cross-platform experiments at scale on two leading supercomputers, Titan at ORNL and Cori at NERSC, and summarize our key findings in this critical area.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE 40th International Conference on Distributed Computing Systems, ICDCS 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages987-997
Number of pages11
ISBN (Electronic)9781728170022
DOIs
StatePublished - Nov 2020
Event40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020 - Singapore, Singapore
Duration: Nov 29 2020Dec 1 2020

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2020-November

Conference

Conference40th IEEE International Conference on Distributed Computing Systems, ICDCS 2020
Country/TerritorySingapore
CitySingapore
Period11/29/2012/1/20

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Keywords

  • Data analytics
  • High-performance computing
  • In-memory computing
  • Workflow

Fingerprint

Dive into the research topics of 'A comprehensive study of in-memory computing on large HPC systems'. Together they form a unique fingerprint.

Cite this