Can Artificial Intelligence Improve the Readability of Patient Education Information in Gynecology?

Research output: Contribution to journalArticlepeer-review

Abstract

Background: The American Medical Association recommends that patient information be written at a sixth-grade level to increase accessibility. However, most existing patient education materials exceed this threshold, posing challenges to patient comprehension. Artificial intelligence, particularly large language models, presents an opportunity to improve the readability of medical information. Despite the growing integration of artificial intelligence in healthcare, few studies have evaluated the effectiveness of large language models in generating or improving readability of existing patient education materials within gynecology. Objective: To assess the readability and effectiveness of patient education materials generated by ChatGPT, Gemini, and CoPilot compared to American College of Obstetricians and Gynecologists and UpToDate.com. Additionally, to determine whether these large language models can successfully adjust the reading level to a sixth-grade standard. Study Design: This cross-sectional study analyzed American College of Obstetricians and Gynecologists, UpToDate, and large language model–generated content, evaluating large language models for 2 tasks: 1) independent large language model–generated materials and 2) large language model–enhanced versions reducing existing patient information to sixth-grade level. All materials were assessed for basic textual analysis and readability using 8 readability formulas. Two board-certified obstetrician-gynecologists evaluated blinded patient education materials for accuracy, clarity, and comprehension. Analysis of variance was used to compare textual analysis and readability scores, with Tukey post-hoc tests identifying differences for both original and enhanced materials. An alpha threshold of P<.004 was used to account for multiple comparisons. Results: Large language model–generated materials were significantly shorter (mean word count 407.9 vs 1132.0; P<.001) but had a higher proportion of difficult words (36.7% vs 27.4%; P<.001). American College of Obstetricians and Gynecologists and UpToDate materials averaged ninth-grade and 8.6-grade levels, respectively, while artificial intelligence–generated content reached a 10.6-grade level (P = .008). Although CoPilot and Gemini improved readability when prompted, no large language model successfully reached the sixth-grade benchmark, and ChatGPT increased reading difficulty. Conclusion: Large language models generated more concise patient education materials but often introduced more complex vocabulary, ultimately failing to meet recommended health literacy standards. Even when explicitly prompted, no large language model achieved the sixth-grade reading level required for optimal patient comprehension. Without proper oversight, artificial intelligence–generated patient education materials may create the illusion of simplicity while reducing true accessibility. Future efforts should focus on integrating health literacy safeguards into artificial intelligence models before clinical implementation.
Original languageEnglish
Pages (from-to)640.e1-640.e9
JournalAmerican Journal of Obstetrics and Gynecology
Volume233
Issue number6
DOIs
StatePublished - Dec 2025

ASJC Scopus Subject Areas

  • Obstetrics and Gynecology

Keywords

  • ACOG
  • ChatGPT
  • CoPilot
  • LLM
  • artificial intelligence
  • bing
  • gemini
  • gynecology
  • patient education
  • women's health
  • Cross-Sectional Studies
  • United States
  • Humans
  • Artificial Intelligence
  • Patient Education as Topic/methods
  • Gynecology/education
  • Health Literacy
  • Comprehension
  • Obstetrics
  • Female

Disciplines

  • Obstetrics and Gynecology

Cite this