TY - GEN
T1 - ReD-LUT
T2 - 41st IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2022
AU - Zhou, Ranyang
AU - Roohi, Arman
AU - Misra, Durga
AU - Angizi, Shaahin
N1 - Publisher Copyright:
© 2022 Association for Computing Machinery.
PY - 2022/10/30
Y1 - 2022/10/30
N2 - In this paper, we propose a reconfigurable processing-in-DRAM architecture named ReD-LUT leveraging the high density of commodity main memory to enable a flexible, general-purpose, and massively parallel computation. ReD-LUT supports lookup table (LUT) queries to efficiently execute complex arithmetic operations (e.g., multiplication, division, etc.) via only memory read operation. In addition, ReD-LUT enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands stored in the same bit-line beyond the scope of prior DRAM-based proposals. We explore the efficacy of ReD-LUT in two computationally-intensive applications, i.e., low-precision deep learning acceleration, and the Advanced Encryption Standard (AES) computation. Our circuit-to-architecture simulation results showthat for a quantized deep learningworkload, ReD-LUT reduces the energy consumption per image by a factor of 21.4× compared with the GPU and achieves ∼37.8× speedup and 2.1× energy-efficiency over the best in-DRAM bit-wise accelerators. As for AES data-encryption, it reduces energy consumption by a factor of ∼2.2× compared to an ASIC implementation.
AB - In this paper, we propose a reconfigurable processing-in-DRAM architecture named ReD-LUT leveraging the high density of commodity main memory to enable a flexible, general-purpose, and massively parallel computation. ReD-LUT supports lookup table (LUT) queries to efficiently execute complex arithmetic operations (e.g., multiplication, division, etc.) via only memory read operation. In addition, ReD-LUT enables bulk bit-wise in-memory logic by elevating the analog operation of the DRAM sub-array to implement Boolean functions between operands stored in the same bit-line beyond the scope of prior DRAM-based proposals. We explore the efficacy of ReD-LUT in two computationally-intensive applications, i.e., low-precision deep learning acceleration, and the Advanced Encryption Standard (AES) computation. Our circuit-to-architecture simulation results showthat for a quantized deep learningworkload, ReD-LUT reduces the energy consumption per image by a factor of 21.4× compared with the GPU and achieves ∼37.8× speedup and 2.1× energy-efficiency over the best in-DRAM bit-wise accelerators. As for AES data-encryption, it reduces energy consumption by a factor of ∼2.2× compared to an ASIC implementation.
UR - http://www.scopus.com/inward/record.url?scp=85144245602&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85144245602&partnerID=8YFLogxK
U2 - 10.1145/3508352.3549469
DO - 10.1145/3508352.3549469
M3 - Conference contribution
AN - SCOPUS:85144245602
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
BT - Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2022
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 October 2022 through 4 November 2022
ER -