Project Details
Description
High-performance computing (HPC) is moving rapidly to the unparalleled level of exaflops in 2021, when the first exascale systems will be ready for science production. Despite the peak performance obtained by simplistic benchmarks during the maintenance window when no other users are allowed to access the system, applications routinely suffer from performance variations as a result of intra- or inter-application interference over storage and network. The consequence is the low system utilization and prolonged time to insights for applications. To address this challenge, this project aims to develop new methods in memory and input/output (I/O) that can significantly reduce the performance variation for large scientific applications. This project provides integrated research and education activities to nurture next-generation computer researchers and engineers in the area of HPC, particularly for those from under-represented groups, to strengthen the U.S. competitiveness in computational science and engineering. This project aims to address the performance variation issue on HPC systems using a novel application-centric approach across the system stack. To address increasing resource contention, a selective hint-sharing scheme is designed to reduce the overall performance variation, and a cluster-partition technique is developed to regulate the scale of hint sharing. In addition, a feedback mechanism is incorporated to adjust the hint traffic according to the degree of performance-variation reduction. Based upon memory-access similarity, memory pages or work nodes sharing high similarity are grouped together to optimize the memory-system performance. Furthermore, a rule-based I/O re-routing scheme, where I/O traffic is re-routed based upon not only the interference profile, but also the requirements of downstream data analytics. In particular, an error-bounded coarsening technique that reacts to performance variation by adjusting the fidelity of an HPC application is explored. The integrated research activities in this project will significantly improve the understanding and methods in managing performance variations for large computational science and engineering applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Status | Active |
---|---|
Effective start/end date | 1/1/22 → 12/31/25 |
Funding
- National Science Foundation: $189,954.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.