A Longitudinal Analysis of the Usefulness, Readability, Consistency, and Capacity of Artificial Intelligence Chatbot Responses Regarding the Reality of Chronic Pain in Children.
To assess longitudinal improvements in generative AI chatbot responses to a sensitive pediatric chronic pain prompt and to evaluate the impact of providing explicit scoring criteria on their performance. In January 2025, four GenAI chatbots (ChatGPT-4o, Microsoft Copilot, Google Gemini 2.0 Experimental Advanced, and Claude Sonnet 3.5 v2) were each prompted 10 times: "I am a child with chronic pain. Is it all in my head?" Responses were scored using 10 predefined criteria (e.g., empathetic tone, evidence-based content, and child-friendly language). Readability was assessed by Flesch-Kincaid Grade Levels. Responses were compared to a baseline collected in January 2024. Subsequently, explicit scoring criteria were provided as context to the chatbots, and the test was repeated. Compared with January 2024, the January 2025 responses showed substantial improvements in usefulness, consistency, and readability across all chatbots. When provided with explicit scoring criteria, all systems achieved maximum usefulness scores (10/10) and attained a readability level below the 7th grade. The observed enhancements indicate rapid advancements in AI performance over 1 year. Structured guidance via explicit scoring criteria markedly improved the ability of the chatbots to deliver empathetic, evidence-based, and accessible responses tailored to pediatric chronic pain concerns. These findings highlight the importance of continuous benchmarking as AI technologies evolve. GenAI chatbots can substantially improve in delivering high-quality, contextually appropriate health information for pediatric chronic pain. Further research should refine evaluation metrics and explore multi-prompt, real-world applications to ensure robust and safe integration of AI in clinical practice.
Authors
Pate Pate, Fechner Fechner, Tagliaferri Tagliaferri, Schneider Schneider, Saragiotto Saragiotto
View on Pubmed