DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge

Sabbir Ahmed, Abdullah Al Arafat, Deniz Najafi, Akhlak Mahmood, Mamshad Nayeem Rizve, Mohaiminul Al Nahian, Ranyang Zhou, Shaahin Angizi, Adnan Siraj Rakin

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Vision Transformers (ViTs) excel in tackling complex vision tasks, yet their substantial size poses significant challenges for applications on resource-constrained edge devices. The increased size of these models leads to higher overhead (e.g., energy, latency) when transmitting model weights between the edge device and the server. Hence, ViTs are not ideal for edge devices where the entire model may not fit on the device. Current model compression techniques often achieve high compression ratios at the expense of performance degradation, particularly for ViTs. To overcome the limitations of existing works, we rethink model compression strategy for ViTs from first principle approach and develop an orthogonal strategy called DeepCompress-ViT. The objective of the DeepCompress-ViT is to encode the model weights to a highly compressed encoded representation using a novel training method, denoted as Unified Compression Training (UCT). Proposed UCT is accompanied by a decoding mechanism during inference, which helps to gain any loss of accuracy due to high compression ratio. We further optimize this decoding step by reordering the decoding operation using associative property of matrix multiplication, ensuring that the compressed weights can be decoded during inference without incurring any computational overhead. Our extensive experiments across multiple ViT models on modern edge devices show that DeepCompress-ViT can successfully compress ViTs at high compression ratios (> 14×). DeepCompress-ViT enables the entire model to be stored on edge device, resulting in unprecedented reductions in energy consumption (> 1470×) and latency (> 68×) for edge ViT inference. Our code is available at https://github.com/ML-Security-Research-LAB/DeepCompress-ViT.

Original languageEnglish (US)
Pages (from-to)30147-30156
Number of pages10
JournalProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
DOIs
StatePublished - 2025
Event2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States
Duration: Jun 11 2025Jun 15 2025

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition

Keywords

  • efficient ai
  • efficient edge inference
  • model compression
  • vision transformer

Fingerprint

Dive into the research topics of 'DeepCompress-ViT: Rethinking Model Compression to Enhance Efficiency of Vision Transformers at the Edge'. Together they form a unique fingerprint.

Cite this