Stylistic Language Drives Perceived Moral Superiority of LLMs

Kalil Warren, Chandler Nichols, Dawson Petersen, Valerie L Shalin, Amit Almor

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we are responding to a recent article published in Scientific Reports by Aharoni et al.1 titled, “Attributions toward artificial agents in a modified Moral Turing Test.” Aharoni et al. tested how humans evaluate the quality of moral reasoning in human-generated and LLM-generated responses to moral questions. The human responses were sourced from university undergraduates, while the LLM responses were generated using OpenAI’s ChatGPT-4. The prompts used to elicit the responses asked whether and why certain actions were morally wrong or acceptable. Ten pairs of human-generated responses and LLM-generated responses were then used as stimuli in a modified Moral Turing Test (m-MTT) in which different human participants rated the quality of these responses. Participants rated the LLM-generated stimuli as showing higher quality of moral virtuousness, trustworthiness, and intelligence. However, the participants were able to distinguish between the human-generated and the LLM-generated responses.
Original languageEnglish
Pages (from-to)39168
JournalScientific Reports
Volume15
Issue number1
DOIs
StatePublished - Nov 7 2025

Cite this