Abstract
Large language models (LLMs) are capable of writing grammatical text that follows instructions, answers questions, and solves problems. As they have advanced, it has become difficult to distinguish their output from human-written text. While past research has found some differences in features such as word choice and punctuation and developed classifiers to detect LLM output, none has studied the rhetorical styles of LLMs. Using several variants of Llama 3 and GPT-4o, we construct two parallel corpora of human- and LLM-written texts from common prompts. Using Douglas Biber’s set of lexical, grammatical, and rhetorical features, we identify systematic differences between LLMs and humans and between different LLMs. These differences persist when moving from smaller models to larger ones and are larger for instruction-tuned models than base models. This observation of differences demonstrates that despite their advanced abilities, LLMs struggle to match human stylistic variation. Attention to more advanced linguistic features can hence detect patterns in their behavior not previously recognized.
| Original language | English (US) |
|---|---|
| Article number | e2422455122 |
| Journal | Proceedings of the National Academy of Sciences of the United States of America |
| Volume | 122 |
| Issue number | 8 |
| DOIs | |
| State | Published - Feb 25 2025 |
All Science Journal Classification (ASJC) codes
- General
Keywords
- corpus linguistics
- large language models
- writing style
Fingerprint
Dive into the research topics of 'Do LLMs write like humans? Variation in grammatical and rhetorical styles'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver