Comparative Evaluation of Ensemble Machine Learning Approaches for Heart Disease Prediction.

This paper presents a computational bench-marking assessment of Ensemble Learning algorithms in the prediction of heart disease, combining different Machine Learning algorithms, such as hard voting, soft voting, and stacking, in a single framework. The evaluation was conducted using publicly available cardiovascular dataset obtained from the Kaggle repository (https://www.kaggle.com/datasets/sid321axn/heart-statlog-cleveland-hungary-final) comprising 1,190 instances and 11 clinical features. The process involves data preprocessing, which includes handling missing values, removing outliers, scaling variables and class balancing to ensure uniform input feature selection, based on Random Forest (RF), is used to eliminate unnecessary features. Among the evaluated models, the stacking ensemble classifier achieved the highest overall accuracy of 91.88% on the test dataset. Although additional metrics such as precision, recall and F1-score were computed for comparative analysis, the emphasis of this study remains on methodological benchmarking rather than clinical validation. Various base classifiers, including Decision Tree, Random Forest, AdaBoost, and XGBoost, are applied and tested independently. These models are then combined using ensemble techniques with hard voting, soft voting, and stacking. In stacking, Logistic Regression is used as the meta-model, which is trained on cross-validated predictions of the out-of-fold samples to avoid overfitting. Evaluations are carried out using accuracy as the primary criterion for comparison, so that individual classification systems and their combination strategies can be compared uniformly in the same preprocessing and validation environment. Though performance metrics are provided for comparative indications, the emphasis of the approach lies in the development and evaluation of strategies and not in their clinical assessment. This protocol makes it easy to compare ensemble machine learning algorithms on publicly available cardiovascular datasets and helps to make a systematic comparison of data preprocessing and ensemble configuration approaches.
Cardiovascular diseases
Care/Management

Authors

Baral Baral, Satpathy Satpathy, Satpathy Satpathy, Kumar Baral Kumar Baral
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard