Decoupled quality and readability in skin cancer education from large language models.

Large language models (LLMs) are increasingly used by the public to obtain health information, yet the relationship between content quality and readability in LLM-generated patient education remains unclear.

We benchmarked five LLMs (Doubao, DeepSeek, Wenxin Yiyan, Tongyi Qianwen, and GPT-5) using an identical set of 20 Mandarin Chinese skin-cancer FAQs (100 total outputs). Quality was assessed using c-PEMAT-P and the Global Quality Scale (GQS), and readability was assessed using seven indices (ARI, FRES, GFOG, FKGL, CL, SMOG, and LW). Group differences and correlations were evaluated with appropriate statistical tests.

Models showed comparable understandability/actionability (c-PEMAT-P), while overall quality (GQS) differed, with GPT-5 scoring highest. Readability varied substantially by both model and content category, and no single model performed best across all readability metrics. Correlation analyses indicated that quality and readability were largely decoupled.

High-quality outputs do not necessarily have high readability. Optimizing AI-generated skin-cancer education requires multi-faceted strategies that jointly consider model choice and content topic.
Cancer
Care/Management
Advocacy
Education

Authors

Zhang Zhang, Wang Wang, Zhang Zhang, Lan Lan
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard