TY - GEN
T1 - PromSec
T2 - 31st ACM SIGSAC Conference on Computer and Communications Security, CCS 2024
AU - Nazzal, Mahmoud
AU - Khalil, Issa
AU - Khreishah, Abdallah
AU - Phan, Hai
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2024/12/9
Y1 - 2024/12/9
N2 - The capability of generating high-quality source code using large language models (LLMs) reduces software development time and costs. However, recent literature and our empirical investigation in this work show that while LLMs can generate functioning code, they inherently tend to introduce security vulnerabilities, limiting their potential. This problem is mainly due to their training on massive open-source corpora exhibiting insecure and inefficient programming practices. Therefore, automatic optimization of LLM prompts for generating secure and functioning code is a demanding need. This paper introduces PromSec, an algorithm for prompt optimization for secure and functioning code generation using LLMs. In PromSec, we combine 1) code vulnerability clearing using a generative adversarial graph neural network, dubbed as gGAN, to fix and reduce security vulnerabilities in generated codes and 2) code generation using an LLM into an interactive loop, such that the outcome of the gGAN drives the LLM with enhanced prompts to generate secure codes while preserving their functionality. Introducing a new contrastive learning approach in gGAN, we formulate the code-clearing and generation loop as a dual-objective optimization problem, enabling PromSec to notably reduce the number of LLM inferences. As a result, PromSec becomes a cost-effective and practical solution for generating secure and functioning codes. Extensive experiments conducted on Python and Java code datasets confirm that PromSec effectively enhances code security while upholding its intended functionality. Our experiments show that despite the comprehensive application of a state-of-the-art approach, it falls short in addressing all vulnerabilities within the code, whereas PromSec effectively resolves each of them. Moreover, PromSec achieves more than an order-of-magnitude reduction in operational time, number of LLM queries, and security analysis costs. Furthermore, prompts optimized with PromSec for a certain LLM are transferable to other LLMs across programming languages and generalizable to unseen vulnerabilities in training. This study presents an essential step towards improving the trustworthiness of LLMs for secure and functioning code generation, significantly enhancing their large-scale integration in real-world software code development practices.
AB - The capability of generating high-quality source code using large language models (LLMs) reduces software development time and costs. However, recent literature and our empirical investigation in this work show that while LLMs can generate functioning code, they inherently tend to introduce security vulnerabilities, limiting their potential. This problem is mainly due to their training on massive open-source corpora exhibiting insecure and inefficient programming practices. Therefore, automatic optimization of LLM prompts for generating secure and functioning code is a demanding need. This paper introduces PromSec, an algorithm for prompt optimization for secure and functioning code generation using LLMs. In PromSec, we combine 1) code vulnerability clearing using a generative adversarial graph neural network, dubbed as gGAN, to fix and reduce security vulnerabilities in generated codes and 2) code generation using an LLM into an interactive loop, such that the outcome of the gGAN drives the LLM with enhanced prompts to generate secure codes while preserving their functionality. Introducing a new contrastive learning approach in gGAN, we formulate the code-clearing and generation loop as a dual-objective optimization problem, enabling PromSec to notably reduce the number of LLM inferences. As a result, PromSec becomes a cost-effective and practical solution for generating secure and functioning codes. Extensive experiments conducted on Python and Java code datasets confirm that PromSec effectively enhances code security while upholding its intended functionality. Our experiments show that despite the comprehensive application of a state-of-the-art approach, it falls short in addressing all vulnerabilities within the code, whereas PromSec effectively resolves each of them. Moreover, PromSec achieves more than an order-of-magnitude reduction in operational time, number of LLM queries, and security analysis costs. Furthermore, prompts optimized with PromSec for a certain LLM are transferable to other LLMs across programming languages and generalizable to unseen vulnerabilities in training. This study presents an essential step towards improving the trustworthiness of LLMs for secure and functioning code generation, significantly enhancing their large-scale integration in real-world software code development practices.
KW - LLMs
KW - code generation
KW - graph generative adversarial networks
KW - secure and functioning codes
UR - http://www.scopus.com/inward/record.url?scp=85215532935&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85215532935&partnerID=8YFLogxK
U2 - 10.1145/3658644.3690298
DO - 10.1145/3658644.3690298
M3 - Conference contribution
AN - SCOPUS:85215532935
T3 - CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security
SP - 2266
EP - 2279
BT - CCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security
PB - Association for Computing Machinery, Inc
Y2 - 14 October 2024 through 18 October 2024
ER -