Exploring multimodal large language models on transthoracic Echocardiogram (TTE) tasks for cardiovascular decision support.

Multimodal large language models (LLMs) offer new potential for enhancing cardiovascular decision support, particularly in interpreting echocardiographic data. This study systematically evaluates and benchmarks foundation models from diverse domains on echocardiogram-based tasks to assess their effectiveness, limitations and potential in clinical cardiovascular applications.

We curated three cardiovascular imaging datasets-EchoNet-Dynamic, TMED2, and an expert-annotated echocardiogram (TTE) dataset-to evaluate performance on four critical tasks: (1) cardiac function evaluation through ejection fraction (EF) prediction, (2) cardiac view classification, (3) aortic stenosis (AS) severity assessment, and (4) cardiovascular disease classification. We evaluated six multimodal LLMs: EchoClip (cardiovascular-specific), BiomedGPT and LLaVA-Med (medical-domain), and MiniCPM-V 2.6, LLaMA-3-Vision-Alpha, and Gemini-1.5 (general-domain). Models were assessed using zero-shot, few-shot, and fine-tuning strategies, where applicable. Performance was measured using mean absolute error (MAE) and root mean squared error (RMSE) for EF prediction, and accuracy, precision, recall, and F1 score for classification tasks.

Domain-specific models such as EchoClip demonstrated the strongest zero-shot performance in EF prediction, achieving an MAE of 10.34. General-domain models showed limited effectiveness without adaptation, with MiniCPM-V 2.6 reporting an MAE of 251.92. Fine-tuning significantly improved outcomes; for example, MiniCPM-V 2.6's MAE decreased to 31.93, and view classification accuracy increased from 20 % to 63.05 %. In classification tasks, EchoClip achieved F1 scores of 0.2716 for AS severity and 0.4919 for disease classification but exhibited limited performance in view classification (F1 = 0.1457). Few-shot learning yielded modest gains but was generally less effective than fine-tuning.

This evaluation and benchmarking study demonstrated the importance of domain-specific pretraining and model adaptation in cardiovascular decision support tasks. Cardiovascular-focused models and fine-tuned general-domain models achieved superior performance, especially for complex assessments such as EF estimation. These findings offer critical insights into the current capabilities and future directions for clinically meaningful AI integration in cardiovascular medicine.
Cardiovascular diseases
Care/Management

Authors

Li Li, Li Li, Sun Sun, Yu Yu, Abdelhameed Abdelhameed, Cao Cao, Li Li, He He, Li Li, Feng Feng, Yu Yu, Hu Hu, Li Li, Kumar Kumar, Dang Dang, Li Li, Gharacholou Gharacholou, Tao Tao
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard