PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)

Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, Hai Phan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The capability of generating high-quality source code using large language models (LLMs) reduces software development time and costs. However, recent literature and our empirical investigation in this work show that while LLMs can generate functioning code, they inherently tend to introduce security vulnerabilities, limiting their potential. This problem is mainly due to their training on massive open-source corpora exhibiting insecure and inefficient programming practices. Therefore, automatic optimization of LLM prompts for generating secure and functioning code is a demanding need. This paper introduces PromSec, an algorithm for prompt optimization for secure and functioning code generation using LLMs. In PromSec, we combine 1) code vulnerability clearing using a generative adversarial graph neural network, dubbed as gGAN, to fix and reduce security vulnerabilities in generated codes and 2) code generation using an LLM into an interactive loop, such that the outcome of the gGAN drives the LLM with enhanced prompts to generate secure codes while preserving their functionality. Introducing a new contrastive learning approach in gGAN, we formulate the code-clearing and generation loop as a dual-objective optimization problem, enabling PromSec to notably reduce the number of LLM inferences. As a result, PromSec becomes a cost-effective and practical solution for generating secure and functioning codes. Extensive experiments conducted on Python and Java code datasets confirm that PromSec effectively enhances code security while upholding its intended functionality. Our experiments show that despite the comprehensive application of a state-of-the-art approach, it falls short in addressing all vulnerabilities within the code, whereas PromSec effectively resolves each of them. Moreover, PromSec achieves more than an order-of-magnitude reduction in operational time, number of LLM queries, and security analysis costs. Furthermore, prompts optimized with PromSec for a certain LLM are transferable to other LLMs across programming languages and generalizable to unseen vulnerabilities in training. This study presents an essential step towards improving the trustworthiness of LLMs for secure and functioning code generation, significantly enhancing their large-scale integration in real-world software code development practices.

Original languageEnglish (US)
Title of host publicationCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security
PublisherAssociation for Computing Machinery, Inc
Pages2266-2279
Number of pages14
ISBN (Electronic)9798400706363
DOIs
StatePublished - Dec 9 2024
Event31st ACM SIGSAC Conference on Computer and Communications Security, CCS 2024 - Salt Lake City, United States
Duration: Oct 14 2024Oct 18 2024

Publication series

NameCCS 2024 - Proceedings of the 2024 ACM SIGSAC Conference on Computer and Communications Security

Conference

Conference31st ACM SIGSAC Conference on Computer and Communications Security, CCS 2024
Country/TerritoryUnited States
CitySalt Lake City
Period10/14/2410/18/24

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Software

Keywords

  • LLMs
  • code generation
  • graph generative adversarial networks
  • secure and functioning codes

Fingerprint

Dive into the research topics of 'PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)'. Together they form a unique fingerprint.

Cite this