TY - GEN
T1 - Casper
T2 - 12th IEEE International Conference on Cyber Security and Cloud Computing, CSCloud 2025
AU - Chong, Chun Jie
AU - Hou, Chenxi
AU - Yao, Zhihao
AU - Talebi, Seyed Mohammadjavad Seyed
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Web-based Large Language Model (LLM) services have been widely adopted and have become an integral part of our Internet experience. Third-party plugins enhance the functionalities of LLMs by enabling access to real-world data and services. However, the privacy consequences associated with these services and their third-party plugins are not well understood. Sensitive prompt data are stored, processed, and shared by cloud-based LLM providers and third-party plugins. In this paper, we propose Casper, a prompt sanitization technique that aims to protect user privacy by detecting and pseudonymizing sensitive information from user inputs before sending them to LLM services. Casper runs entirely on the user's device as a browser extension and does not require any changes to the online LLM services. At the core of Casper is a three-layered sanitization mechanism consisting of a rule-based filter, a Machine Learning (ML)-based named entity recognizer, and a browser-based local LLM topic identifier. We evaluate Casper on a dataset of 4500 synthesized prompts and 2000 real-world prompts. The results show that Casper can effectively filter out Personal Identifiable Information (PII) with an accuracy of 92.6 %, and detect privacy-sensitive topics with an accuracy ranging from 92.5 % to 94.0 %. Furthermore, Casper successfully pseudonymized 92.0% of the sensitive information in the prompts while ensuring that the LLM's responses to the sanitized prompts remained moderately similar to those for the original prompts, with a cosine similarity of 0.538.
AB - Web-based Large Language Model (LLM) services have been widely adopted and have become an integral part of our Internet experience. Third-party plugins enhance the functionalities of LLMs by enabling access to real-world data and services. However, the privacy consequences associated with these services and their third-party plugins are not well understood. Sensitive prompt data are stored, processed, and shared by cloud-based LLM providers and third-party plugins. In this paper, we propose Casper, a prompt sanitization technique that aims to protect user privacy by detecting and pseudonymizing sensitive information from user inputs before sending them to LLM services. Casper runs entirely on the user's device as a browser extension and does not require any changes to the online LLM services. At the core of Casper is a three-layered sanitization mechanism consisting of a rule-based filter, a Machine Learning (ML)-based named entity recognizer, and a browser-based local LLM topic identifier. We evaluate Casper on a dataset of 4500 synthesized prompts and 2000 real-world prompts. The results show that Casper can effectively filter out Personal Identifiable Information (PII) with an accuracy of 92.6 %, and detect privacy-sensitive topics with an accuracy ranging from 92.5 % to 94.0 %. Furthermore, Casper successfully pseudonymized 92.0% of the sensitive information in the prompts while ensuring that the LLM's responses to the sanitized prompts remained moderately similar to those for the original prompts, with a cosine similarity of 0.538.
KW - Large Language Model
KW - Web Privacy
UR - https://www.scopus.com/pages/publications/105030342314
UR - https://www.scopus.com/pages/publications/105030342314#tab=citedBy
U2 - 10.1109/CSCloud66326.2025.00027
DO - 10.1109/CSCloud66326.2025.00027
M3 - Conference contribution
AN - SCOPUS:105030342314
T3 - Proceedings - 2025 IEEE 12th International Conference on Cyber Security and Cloud Computing, CSCloud 2025
SP - 122
EP - 131
BT - Proceedings - 2025 IEEE 12th International Conference on Cyber Security and Cloud Computing, CSCloud 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 November 2025 through 9 November 2025
ER -