Reducing the Training Overhead of the HPC Compression Autoencoder via Dataset Proportioning

Tong Liu, Shakeel Alibhai, Jinzhen Wang, Qing Liu, Xubin He

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As the storage overhead of high-performance computing (HPC) data reaches into the petabyte or even exabyte scale, it could be useful to find new methods of compressing such data. The compression autoencoder (CAE) has recently been proposed to compress HPC data with a very high compression ratio. However, this machine learning-based method suffers from the major drawback of lengthy training time. In this paper, we attempt to mitigate this problem by proposing a proportioning scheme to reduce the amount of data that is used for training relative to the amount of data to be compressed. We show that this method drastically reduces the training time without, in most cases, significantly increasing the error. We further explain how this scheme can even improve the accuracy of the CAE on certain datasets. Finally, we provide some guidance on how to determine a suitable proportion of the training dataset to use in order to train the CAE for a given dataset.

Original languageEnglish (US)
Title of host publication2021 IEEE International Conference on Networking, Architecture and Storage, NAS 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728177441
DOIs
StatePublished - 2021
Event15th IEEE International Conference on Networking, Architecture and Storage, NAS 2021 - Riverside, United States
Duration: Oct 24 2021Oct 26 2021

Publication series

Name2021 IEEE International Conference on Networking, Architecture and Storage, NAS 2021 - Proceedings

Conference

Conference15th IEEE International Conference on Networking, Architecture and Storage, NAS 2021
Country/TerritoryUnited States
CityRiverside
Period10/24/2110/26/21

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management

Keywords

  • Data compression
  • HPC
  • autoencoder
  • machine learning
  • training time

Fingerprint

Dive into the research topics of 'Reducing the Training Overhead of the HPC Compression Autoencoder via Dataset Proportioning'. Together they form a unique fingerprint.

Cite this