Evaluating the reliability and guideline concordance of ChatGPT-5 in the management of vascular diseases: a cross-sectional expert-based assessment.
Artificial intelligence (AI) tools such as large language models are increasingly used in clinical decision support, yet their reliability in vascular medicine remains uncertain. This study evaluated the accuracy and guideline concordance of ChatGPT-5 in vascular disease management.
Seventy open-ended clinical questions were derived from five major national and international vascular guidelines. Responses generated by ChatGPT-5 were independently assessed by five cardiovascular surgeons using a five-point Likert scale. Inter-rater agreement was analyzed using the free-marginal multirater kappa statistic.
ChatGPT-5 achieved a mean score of 4.74±0.27, showing strong consistency with evidence-based recommendations. Forty questions (57%) received perfect agreement, and inter-rater reliability was moderate (κ=0.50; 95% CI: 0.37-0.64).
ChatGPT-5 produced guideline-aligned and clinically sound responses in vascular disease scenarios. While promising as a supportive clinical tool, broader datasets and real-world validations are needed to ensure clinical translatability.
Seventy open-ended clinical questions were derived from five major national and international vascular guidelines. Responses generated by ChatGPT-5 were independently assessed by five cardiovascular surgeons using a five-point Likert scale. Inter-rater agreement was analyzed using the free-marginal multirater kappa statistic.
ChatGPT-5 achieved a mean score of 4.74±0.27, showing strong consistency with evidence-based recommendations. Forty questions (57%) received perfect agreement, and inter-rater reliability was moderate (κ=0.50; 95% CI: 0.37-0.64).
ChatGPT-5 produced guideline-aligned and clinically sound responses in vascular disease scenarios. While promising as a supportive clinical tool, broader datasets and real-world validations are needed to ensure clinical translatability.