TY - GEN
T1 - LLWRA
T2 - 2025 International Conference on Control, Automation and Diagnosis, ICCAD 2025
AU - Almalky, Abeer
AU - Ahmed, Sabbir
AU - Zhou, Ranyang
AU - Nahian, Mohaiminul Al
AU - Arafat, Abdullah Al
AU - Angizi, Shaahin
AU - Rakin, Adnan Siraj
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - The enormous size of large language models (LLMs) makes storing the entire model in on-chip memory implausible. The LLM application requires an external off-chip memory to store model weights, exposing them to memory fault injection attacks. In this work, we introduce a novel approach to compromise the performance of LLMs by exploiting a novel fault injection mechanism that introduces targeted bit-flips in page frame numbers of main memory. In the context of main memory, each weight block consists of a set of weights stored at a specific address. As a result, a single bit-flip in the page frame number can replace a target weight block with a new replacement weight block, disrupting the memory translation. However, the algorithmic challenge of creating a formal attack lies in the fact that random weight replacement faults fail to produce detrimental effects on model performance. In this work, we propose LLWRA which effectively utilizes weight replacement fault injection to degrade the intelligence of state-of-the-art LLMs for the first time. Additionally, we present the ReBlock search algorithm, which efficiently identifies a set of vulnerable target and replacement weight blocks. We evaluate our approach, LLWRA, across three distinct attack objectives: untargeted classification, targeted classification, and untargeted causal modeling. Experimental results demonstrate that LLWRA requires fewer than five attack optimization rounds to reduce classification accuracy to a random guess level and fewer than nine iterations to reduce the causal model into a random generator, making our attack the most lethal weight manipulation attack against LLMs.
AB - The enormous size of large language models (LLMs) makes storing the entire model in on-chip memory implausible. The LLM application requires an external off-chip memory to store model weights, exposing them to memory fault injection attacks. In this work, we introduce a novel approach to compromise the performance of LLMs by exploiting a novel fault injection mechanism that introduces targeted bit-flips in page frame numbers of main memory. In the context of main memory, each weight block consists of a set of weights stored at a specific address. As a result, a single bit-flip in the page frame number can replace a target weight block with a new replacement weight block, disrupting the memory translation. However, the algorithmic challenge of creating a formal attack lies in the fact that random weight replacement faults fail to produce detrimental effects on model performance. In this work, we propose LLWRA which effectively utilizes weight replacement fault injection to degrade the intelligence of state-of-the-art LLMs for the first time. Additionally, we present the ReBlock search algorithm, which efficiently identifies a set of vulnerable target and replacement weight blocks. We evaluate our approach, LLWRA, across three distinct attack objectives: untargeted classification, targeted classification, and untargeted causal modeling. Experimental results demonstrate that LLWRA requires fewer than five attack optimization rounds to reduce classification accuracy to a random guess level and fewer than nine iterations to reduce the causal model into a random generator, making our attack the most lethal weight manipulation attack against LLMs.
KW - bit-flip attack
KW - large language models
KW - weight replacement attack
UR - https://www.scopus.com/pages/publications/105014519938
UR - https://www.scopus.com/pages/publications/105014519938#tab=citedBy
U2 - 10.1109/ICCAD64771.2025.11099200
DO - 10.1109/ICCAD64771.2025.11099200
M3 - Conference contribution
AN - SCOPUS:105014519938
T3 - 2025 International Conference on Control, Automation and Diagnosis, ICCAD 2025
BT - 2025 International Conference on Control, Automation and Diagnosis, ICCAD 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 July 2025 through 3 July 2025
ER -