TY - GEN
T1 - Prompt Wrangling
T2 - 19th International Conference on the Foundations of Digital Games, FDG 2024
AU - Moradi Karkaj, Arash
AU - Nelson, Mark J.
AU - Koutis, Ioannis
AU - Hoover, Amy K.
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/5/21
Y1 - 2024/5/21
N2 - The ChatGPT4PCG competition calls for participants to submit inputs to ChatGPT or prompts that guide its output toward instructions to generate levels as sequences of Tetris-like block drops. Prompts submitted to the competition are queried by ChatGPT to generate levels that resemble letters of the English alphabet. Levels are evaluated based on their similarity to the target letter and physical stability in the game engine. This provides a quantitative evaluation setting for prompt-based procedural content generation (PCG), an approach that has been gaining popularity in PCG, as in other areas of generative AI. This paper focuses on replicating and generalizing the competition results. The replication experiments in the paper first aim to test whether the number of responses gathered from ChatGPT is sufficient to account for the stochasticity requery the original prompt submissions to rerun the original scripts from the competition on different machines about six months after the competition organizers. We re-run the competition, using the original scripts, but on our own machines, several months later, and with varying sample sizes. We find that results largely replicate, except that two of the 15 submissions do much better in our replication, for reasons we can only partly determine. When it comes to generalization, we notice that the top-performing prompt has instructions for all 26 target levels hardcoded, which is at odds with the PCGML goal of generating new, previously unseen content from examples. We perform experiments in a more restricted few-shot prompting scenario, and find that generalization remains a challenge for current approaches.
AB - The ChatGPT4PCG competition calls for participants to submit inputs to ChatGPT or prompts that guide its output toward instructions to generate levels as sequences of Tetris-like block drops. Prompts submitted to the competition are queried by ChatGPT to generate levels that resemble letters of the English alphabet. Levels are evaluated based on their similarity to the target letter and physical stability in the game engine. This provides a quantitative evaluation setting for prompt-based procedural content generation (PCG), an approach that has been gaining popularity in PCG, as in other areas of generative AI. This paper focuses on replicating and generalizing the competition results. The replication experiments in the paper first aim to test whether the number of responses gathered from ChatGPT is sufficient to account for the stochasticity requery the original prompt submissions to rerun the original scripts from the competition on different machines about six months after the competition organizers. We re-run the competition, using the original scripts, but on our own machines, several months later, and with varying sample sizes. We find that results largely replicate, except that two of the 15 submissions do much better in our replication, for reasons we can only partly determine. When it comes to generalization, we notice that the top-performing prompt has instructions for all 26 target levels hardcoded, which is at odds with the PCGML goal of generating new, previously unseen content from examples. We perform experiments in a more restricted few-shot prompting scenario, and find that generalization remains a challenge for current approaches.
KW - Evaluating Generalization
KW - Generalizability
KW - Large Language Models (LLMs)
KW - Procedural content generation (PCG)
KW - Science Birds
UR - http://www.scopus.com/inward/record.url?scp=85199032754&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85199032754&partnerID=8YFLogxK
U2 - 10.1145/3649921.3659853
DO - 10.1145/3649921.3659853
M3 - Conference contribution
AN - SCOPUS:85199032754
T3 - ACM International Conference Proceeding Series
BT - Proceedings of the 19th International Conference on the Foundations of Digital Games, FDG 2024
A2 - Smith, Gillian
A2 - Whitehead, Jim
A2 - Samuel, Ben
A2 - Spiel, Katta
A2 - van Rozen, Riemer
PB - Association for Computing Machinery
Y2 - 21 May 2024 through 24 May 2024
ER -