LLWRA: Large Language Models Weight Replacement Attack

  • Abeer Almalky
  • , Sabbir Ahmed
  • , Ranyang Zhou
  • , Mohaiminul Al Nahian
  • , Abdullah Al Arafat
  • , Shaahin Angizi
  • , Adnan Siraj Rakin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The enormous size of large language models (LLMs) makes storing the entire model in on-chip memory implausible. The LLM application requires an external off-chip memory to store model weights, exposing them to memory fault injection attacks. In this work, we introduce a novel approach to compromise the performance of LLMs by exploiting a novel fault injection mechanism that introduces targeted bit-flips in page frame numbers of main memory. In the context of main memory, each weight block consists of a set of weights stored at a specific address. As a result, a single bit-flip in the page frame number can replace a target weight block with a new replacement weight block, disrupting the memory translation. However, the algorithmic challenge of creating a formal attack lies in the fact that random weight replacement faults fail to produce detrimental effects on model performance. In this work, we propose LLWRA which effectively utilizes weight replacement fault injection to degrade the intelligence of state-of-the-art LLMs for the first time. Additionally, we present the ReBlock search algorithm, which efficiently identifies a set of vulnerable target and replacement weight blocks. We evaluate our approach, LLWRA, across three distinct attack objectives: untargeted classification, targeted classification, and untargeted causal modeling. Experimental results demonstrate that LLWRA requires fewer than five attack optimization rounds to reduce classification accuracy to a random guess level and fewer than nine iterations to reduce the causal model into a random generator, making our attack the most lethal weight manipulation attack against LLMs.

Original languageEnglish (US)
Title of host publication2025 International Conference on Control, Automation and Diagnosis, ICCAD 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331511913
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 International Conference on Control, Automation and Diagnosis, ICCAD 2025 - Barcelona, Spain
Duration: Jul 1 2025Jul 3 2025

Publication series

Name2025 International Conference on Control, Automation and Diagnosis, ICCAD 2025

Conference

Conference2025 International Conference on Control, Automation and Diagnosis, ICCAD 2025
Country/TerritorySpain
CityBarcelona
Period7/1/257/3/25

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Safety, Risk, Reliability and Quality
  • Control and Optimization
  • Modeling and Simulation

Keywords

  • bit-flip attack
  • large language models
  • weight replacement attack

Fingerprint

Dive into the research topics of 'LLWRA: Large Language Models Weight Replacement Attack'. Together they form a unique fingerprint.

Cite this