Machine learning-based prediction of CAC-defined cardiovascular risk using routine health examination data: a retrospective cross-sectional study in a Taiwanese population.
Early identification of individuals at elevated cardiovascular risk using routine health examination data is essential for preventive cardiology. Machine learning (ML) offers a scalable and non-invasive approach to enhance risk stratification.
This retrospective study analyzed 899 asymptomatic adults (mean age: 57.3 ± 11.8 years; 608 men) who underwent coronary artery calcium (CAC) scanning or coronary computed tomography angiography (CCTA) at a health management center in Taiwan between 2018 and 2021. Participants were classified into four CAC-based risk categories (0, 1-99, 100-299, ≥300). Nineteen demographic and clinical variables were used to train decision tree (DT), random forest (RF), and support vector machine (SVM) classifiers. Model performance was evaluated using accuracy and AUC, with AUC differences assessed by DeLong's test.
The RF model demonstrated the best performance (accuracy: 76 %, AUC: 0.78), followed by SVM (accuracy: 70 %, AUC: 0.78) and DT (accuracy: 74 %, AUC: 0.75). All models showed clinically meaningful discrimination using readily accessible, non-laboratory health examination data.
ML models incorporating routine health examination variables can effectively predict CAC-defined cardiovascular risk and may serve as practical, scalable pre-screening tools within preventive healthcare workflows, particularly in settings where laboratory testing or advanced imaging resources may be limited.
This retrospective study analyzed 899 asymptomatic adults (mean age: 57.3 ± 11.8 years; 608 men) who underwent coronary artery calcium (CAC) scanning or coronary computed tomography angiography (CCTA) at a health management center in Taiwan between 2018 and 2021. Participants were classified into four CAC-based risk categories (0, 1-99, 100-299, ≥300). Nineteen demographic and clinical variables were used to train decision tree (DT), random forest (RF), and support vector machine (SVM) classifiers. Model performance was evaluated using accuracy and AUC, with AUC differences assessed by DeLong's test.
The RF model demonstrated the best performance (accuracy: 76 %, AUC: 0.78), followed by SVM (accuracy: 70 %, AUC: 0.78) and DT (accuracy: 74 %, AUC: 0.75). All models showed clinically meaningful discrimination using readily accessible, non-laboratory health examination data.
ML models incorporating routine health examination variables can effectively predict CAC-defined cardiovascular risk and may serve as practical, scalable pre-screening tools within preventive healthcare workflows, particularly in settings where laboratory testing or advanced imaging resources may be limited.