High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data

Tong Liu, Jinzhen Wang, Qing Liu, Shakeel Alibhai, Tao Lu, Xubin He

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


Scientific simulations on high-performance computing (HPC) systems can generate large amounts of floating-point data per run. As compared to lossless compressors, lossy compressors, such as SZ and ZFP, can reduce data volume more aggressively while maintaining the usefulness of the data. However, a reduction ratio of more than two orders of magnitude is almost impossible without seriously distorting the data. In deep learning, the autoencoder has shown potential for data compression. Whether the autoencoder can deliver similar performance on scientific data, however, is unknown. In this paper, we for the first time conduct a comprehensive study on the use of autoencoders to compress real-world scientific data and illustrate several key findings on using autoencoders for scientific data reduction. We implement an autoencoder-based compression prototype to reduce floating-point data. Our study shows that the out-of-the-box implementation needs to be further tuned in order to achieve high compression ratios and satisfactory error bounds. Our evaluation results show that, for most test datasets, the tuned autoencoder outperforms SZ by 2 to 4X, and ZFP by 10 to 50X in compression ratios, respectively. Our practices and lessons learned in this work can direct future optimizations for using autoencoders to compress scientific data.

Original languageEnglish (US)
JournalIEEE Transactions on Big Data
StateAccepted/In press - 2021

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management


  • Big Data
  • Compressors
  • Data models
  • Decoding
  • Image coding
  • Lossy data compression
  • Prototypes
  • Tuning
  • autoencoder
  • machine learning
  • scientific data


Dive into the research topics of 'High-Ratio Lossy Compression: Exploring the Autoencoder to Compress Scientific Data'. Together they form a unique fingerprint.

Cite this