Prompt Wrangling: On Replication and Generalization in Large Language Models for PCG Levels

Arash Moradi Karkaj, Mark J. Nelson, Ioannis Koutis, Amy K. Hoover

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The ChatGPT4PCG competition calls for participants to submit inputs to ChatGPT or prompts that guide its output toward instructions to generate levels as sequences of Tetris-like block drops. Prompts submitted to the competition are queried by ChatGPT to generate levels that resemble letters of the English alphabet. Levels are evaluated based on their similarity to the target letter and physical stability in the game engine. This provides a quantitative evaluation setting for prompt-based procedural content generation (PCG), an approach that has been gaining popularity in PCG, as in other areas of generative AI. This paper focuses on replicating and generalizing the competition results. The replication experiments in the paper first aim to test whether the number of responses gathered from ChatGPT is sufficient to account for the stochasticity requery the original prompt submissions to rerun the original scripts from the competition on different machines about six months after the competition organizers. We re-run the competition, using the original scripts, but on our own machines, several months later, and with varying sample sizes. We find that results largely replicate, except that two of the 15 submissions do much better in our replication, for reasons we can only partly determine. When it comes to generalization, we notice that the top-performing prompt has instructions for all 26 target levels hardcoded, which is at odds with the PCGML goal of generating new, previously unseen content from examples. We perform experiments in a more restricted few-shot prompting scenario, and find that generalization remains a challenge for current approaches.

Original languageEnglish (US)
Title of host publicationProceedings of the 19th International Conference on the Foundations of Digital Games, FDG 2024
EditorsGillian Smith, Jim Whitehead, Ben Samuel, Katta Spiel, Riemer van Rozen
PublisherAssociation for Computing Machinery
ISBN (Electronic)9798400709555
DOIs
StatePublished - May 21 2024
Event19th International Conference on the Foundations of Digital Games, FDG 2024 - Worcester, United States
Duration: May 21 2024May 24 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference19th International Conference on the Foundations of Digital Games, FDG 2024
Country/TerritoryUnited States
CityWorcester
Period5/21/245/24/24

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Keywords

  • Evaluating Generalization
  • Generalizability
  • Large Language Models (LLMs)
  • Procedural content generation (PCG)
  • Science Birds

Fingerprint

Dive into the research topics of 'Prompt Wrangling: On Replication and Generalization in Large Language Models for PCG Levels'. Together they form a unique fingerprint.

Cite this