Abstract
In this paper, we are responding to a recent article published in Scientific Reports by Aharoni et al.1 titled, “Attributions toward artificial agents in a modified Moral Turing Test.” Aharoni et al. tested how humans evaluate the quality of moral reasoning in human-generated and LLM-generated responses to moral questions. The human responses were sourced from university undergraduates, while the LLM responses were generated using OpenAI’s ChatGPT-4. The prompts used to elicit the responses asked whether and why certain actions were morally wrong or acceptable. Ten pairs of human-generated responses and LLM-generated responses were then used as stimuli in a modified Moral Turing Test (m-MTT) in which different human participants rated the quality of these responses. Participants rated the LLM-generated stimuli as showing higher quality of moral virtuousness, trustworthiness, and intelligence. However, the participants were able to distinguish between the human-generated and the LLM-generated responses.
| Original language | English |
|---|---|
| Pages (from-to) | 39168 |
| Journal | Scientific Reports |
| Volume | 15 |
| Issue number | 1 |
| DOIs | |
| State | Published - Nov 7 2025 |