TY - GEN
T1 - Processing-in-Memory Acceleration of MAC-based Applications Using Residue Number System
T2 - 31st Great Lakes Symposium on VLSI, GLSVLSI 2021
AU - Angizi, Shaahin
AU - Roohi, Arman
AU - Taheri, Mohammadreza
AU - Fan, Deliang
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6/22
Y1 - 2021/6/22
N2 - Processing-in-memory (PIM) has raised as a viable solution for the memory wall crisis and has attracted great interest in accelerating computationally intensive AI applications ranging from filtering to complex neural networks. In this paper, we try to take advantage of both PIM and the residue number system (RNS) as an alternative for the conventional binary number representation to accelerate multiplication-And-Accumulations (MACs), primary operations of target applications. The PIM architecture utilizes the maximum internal bandwidth of memory chips to realize a local and parallel computation to eliminates the off-chip data transfer. Moreover, RNS limits inter-digit carry propagation by performing arithmetic operations on small residues independently and in parallel. Thus, we develop a PIM-RNS, entitled PRIMS, and analyze the potential of intertwining PIM architecture with the inherent parallelism of the RNS arithmetic to delineate the opportunities and challenges. To this end, we build a comprehensive device-To-Architecture evaluation framework to quantitatively study this problem considering the impact of PIM technology for a well-known three-moduli set as a case study.
AB - Processing-in-memory (PIM) has raised as a viable solution for the memory wall crisis and has attracted great interest in accelerating computationally intensive AI applications ranging from filtering to complex neural networks. In this paper, we try to take advantage of both PIM and the residue number system (RNS) as an alternative for the conventional binary number representation to accelerate multiplication-And-Accumulations (MACs), primary operations of target applications. The PIM architecture utilizes the maximum internal bandwidth of memory chips to realize a local and parallel computation to eliminates the off-chip data transfer. Moreover, RNS limits inter-digit carry propagation by performing arithmetic operations on small residues independently and in parallel. Thus, we develop a PIM-RNS, entitled PRIMS, and analyze the potential of intertwining PIM architecture with the inherent parallelism of the RNS arithmetic to delineate the opportunities and challenges. To this end, we build a comprehensive device-To-Architecture evaluation framework to quantitatively study this problem considering the impact of PIM technology for a well-known three-moduli set as a case study.
KW - multiplication-And-Accumulation
KW - processing-in-memory
KW - residue number system
UR - http://www.scopus.com/inward/record.url?scp=85109209243&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85109209243&partnerID=8YFLogxK
U2 - 10.1145/3453688.3461529
DO - 10.1145/3453688.3461529
M3 - Conference contribution
AN - SCOPUS:85109209243
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 265
EP - 270
BT - GLSVLSI 2021 - Proceedings of the 2021 Great Lakes Symposium on VLSI
PB - Association for Computing Machinery
Y2 - 22 June 2021 through 25 June 2021
ER -