Virtual case reasoning and AI-assisted diagnostic instruction: an empirical study based on body interact and large language models.

Integrating large language models (LLMs) with virtual patient platforms offers a novel approach to teaching clinical reasoning. This study evaluated the performance and educational value of combining Body Interact with two AI models, ChatGPT-4 and DeepSeek-R1, across acute care scenarios.

Three standardized cases (coma, stroke, trauma) were simulated by two medical researchers. Structured case summaries were input into both models using identical prompts. Outputs were assessed for diagnostic and treatment consistency, alignment with clinical reasoning stages, and educational quality using expert scoring, AI self-assessment, text readability indices, and Grammarly analysis.

ChatGPT-4 performed best in stroke scenarios but was less consistent in coma and trauma cases. DeepSeek-R1 showed more stable diagnostic and therapeutic output across all cases. While both models received high expert and self-assessment scores, ChatGPT-4 produced more readable outputs, and DeepSeek-R1 demonstrated greater grammatical precision.

Our findings suggest that ChatGPT-4 and DeepSeek-R1 each offer unique strengths for AI-assisted instruction. ChatGPT-4's accessible language may better support early learners, whereas DeepSeek-R1 may be more aligned with formal clinical reasoning. Selecting models based on specific teaching goals can enhance the effectiveness of AI-driven medical education.
Cardiovascular diseases
Access
Care/Management

Authors

Chen Chen, Lin Lin, Zhang Zhang, Luo Luo, Shin Shin, Li Li
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard