TY - GEN
T1 - TPU-Gen
T2 - 1st IEEE International Conference on LLM-Aided Design, ICLAD 2025
AU - Vungarala, Deepak
AU - Elbtity, Mohammed Essa
AU - Pandit, Kartik
AU - Syed, Sumiya
AU - Alam, Sakila
AU - Ghosh, Arnob
AU - Zand, Ramtin
AU - Angizi, Shaahin
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless, designing optimal TPU remains challenging due to the high domain expertise level, considerable manual design time, and lack of high-quality, domain-specific datasets. This paper introduces TPU-Gen, the first Large Language Model (LLM) based framework designed to automate the exact and approximate TPU generation process, focusing on systolic array architectures. TPU-Gen is supported with a meticulously curated, comprehensive, and open-source dataset that covers a wide range of spatial array designs and approximate multiply-and-accumulate units, enabling design reuse, adaptation, and customization for different DNN workloads. The proposed framework leverages Retrieval-Augmented Generation (RAG) as an effective solution for a data-scarce hardware domain in building LLMs, addressing the most intriguing issue, hallucinations. TPU-Gen transforms high-level architectural specifications into optimized low-level implementations through an effective hardware generation pipeline. Our extensive experimental evaluations demonstrate superior performance, power, and area efficiency, with an average reduction in area and power of 92% and 96% from the manual optimization reference values. These results set new standards for driving advancements in next-generation design automation tools powered by LLMs.1
AB - The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless, designing optimal TPU remains challenging due to the high domain expertise level, considerable manual design time, and lack of high-quality, domain-specific datasets. This paper introduces TPU-Gen, the first Large Language Model (LLM) based framework designed to automate the exact and approximate TPU generation process, focusing on systolic array architectures. TPU-Gen is supported with a meticulously curated, comprehensive, and open-source dataset that covers a wide range of spatial array designs and approximate multiply-and-accumulate units, enabling design reuse, adaptation, and customization for different DNN workloads. The proposed framework leverages Retrieval-Augmented Generation (RAG) as an effective solution for a data-scarce hardware domain in building LLMs, addressing the most intriguing issue, hallucinations. TPU-Gen transforms high-level architectural specifications into optimized low-level implementations through an effective hardware generation pipeline. Our extensive experimental evaluations demonstrate superior performance, power, and area efficiency, with an average reduction in area and power of 92% and 96% from the manual optimization reference values. These results set new standards for driving advancements in next-generation design automation tools powered by LLMs.1
KW - accelerators
KW - large language model
KW - retrieval-augmented generation
KW - tensor processing unit
UR - https://www.scopus.com/pages/publications/105015886296
UR - https://www.scopus.com/pages/publications/105015886296#tab=citedBy
U2 - 10.1109/ICLAD65226.2025.00010
DO - 10.1109/ICLAD65226.2025.00010
M3 - Conference contribution
AN - SCOPUS:105015886296
T3 - Proceedings - 2025 IEEE International Conference on LLM-Aided Design, ICLAD 2025
SP - 1
EP - 8
BT - Proceedings - 2025 IEEE International Conference on LLM-Aided Design, ICLAD 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 June 2025 through 27 June 2025
ER -