Stability-preserving Lossy Compression for Large-scale Partial Differential Equations

  • Qian Gong
  • , Mark Ainsworth
  • , Jieyang Chen
  • , Xin Liang
  • , Liangji Zhu
  • , Ethan Klasky
  • , Tushar Athawale
  • , Qing Liu
  • , Anand Rangarajan
  • , Sanjay Ranka
  • , Scott Klasky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Checkpoint/Restart (C/R) strategies are vital for fault tolerance in PDE-based scientific simulations, yet traditional checkpointing incurs significant I/O overhead. Lossy compression offers a scalable solution by reducing checkpoint data size, but conventional methods often lack control over physical invariants (e.g., energy), leading to instability such as oscillations or divergence in Partial Differential Equations (PDE) systems. This paper introduces a stability-preserving compression approach tailored for PDE simulations by explicitly controlling kinetic and potential energy perturbations to ensure stable restarts. Extensive experiments conducted across diverse PDE configurations demonstrate that our method maintains numerical stability with minimal error magnification-even across multiple checkpoint-restart cycles-outperforming state-of-the-art lossy compressors. Parallel evaluations on the Frontier supercomputer show up to 8.4× improvement in checkpoint write performance and 6.3× in read performance, while maintaining relative L2 errors ∼2e-6 throughout continued simulation. These results provide practical guidance for balancing compression accuracy, stability, and computational efficiency in large-scale PDE applications.

Original languageEnglish (US)
Title of host publicationProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
PublisherAssociation for Computing Machinery, Inc
Pages1992-2005
Number of pages14
ISBN (Electronic)9798400714665
DOIs
StatePublished - Nov 15 2025
Event2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025 - St. Louis, United States
Duration: Nov 16 2025Nov 21 2025

Publication series

NameProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025

Conference

Conference2025 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2025
Country/TerritoryUnited States
CitySt. Louis
Period11/16/2511/21/25

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture

Keywords

  • Checkpoint-restart
  • large-scale PDEs
  • lossy compression
  • stability preservation

Fingerprint

Dive into the research topics of 'Stability-preserving Lossy Compression for Large-scale Partial Differential Equations'. Together they form a unique fingerprint.

Cite this