ViTSen: Bridging Vision Transformers and Edge Computing with Advanced In/Near-Sensor Processing

Sepehr Tabrizchi, Brendan C. Reidy, Deniz Najafi, Shaahin Angizi, Ramtin Zand, Arman Roohi

Research output: Contribution to journalArticlepeer-review

Abstract

This letter introduces ViTSen, optimizing vision transformers (ViTs) for resource-constrained edge devices. It features an in-sensor image compression technique to reduce data conversion and transmission power costs effectively. Further, ViTSen incorporates a ReRAM array, allowing efficient near-sensor analog convolution. This integration, novel pixel reading, and peripheral circuitry decrease the reliance on analog buffers and converters, significantly lowering power consumption. To make ViTSen compatible, several established ViT algorithms have undergone quantization and channel reduction. Circuit-to-application co-simulation results show that ViTSen maintains accuracy comparable to a full-precision baseline across various data precisions, achieving an efficiency of ∼3.1 TOp/s/W.

Original languageEnglish (US)
Pages (from-to)341-344
Number of pages4
JournalIEEE Embedded Systems Letters
Volume16
Issue number4
DOIs
StatePublished - 2024

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • General Computer Science

Keywords

  • In-sensor processing (ISP)
  • Internet of Things (IoT)
  • vision transformer (ViT)

Fingerprint

Dive into the research topics of 'ViTSen: Bridging Vision Transformers and Edge Computing with Advanced In/Near-Sensor Processing'. Together they form a unique fingerprint.

Cite this