Development of advanced lung cancer inflammation index-based machine learning models for predicting stroke and mortality: A comparative and interpretable study.
The Advanced Lung Cancer Inflammation Index (ALI) is a novel composite index that enables a more holistic evaluation of inflammation and nutritional status than established single or commonly used indices. However, ALI has not been extensively studied in patients with stroke. In this study, we aimed to investigate: 1) the association between ALI and stroke risk and 2) the association between ALI and all-cause mortality among patients with stroke, as well as 3) develop and interpret machine learning (ML) models to predict stroke and prognosis.
Using data from the National Health and Nutrition Examination Survey (NHANES) 1999-2018, logistic regression and Cox regression assessed associations of ALI with stroke and mortality. Non-linear relationships were analysed using restricted cubic spline and subgroup stratification. Logistic regression (LR), extreme gradient boosting (XGBoost), random forest (RF), K-Nearest Neighbor (KNN), supported vector machine (SVM), and decision tree (DT) were developed for stroke and mortality prediction and evaluated using the area under the receiver operating characteristic curve (AUCROC), and metrics such as accuracy. Shapley additive explanations (SHAP) and Gini importance enhanced the dual-interpretability of the model.
Among the 46,451 participants, higher ALI was associated with a lower stroke risk, whereas mortality decreased with increasing ALI before stabilising at an inflection point (ALI = 40.91, Pthreshold < 0.001). Age stratification significantly modified the association between ALI and mortality. The RF model marginally outperformed the other models in terms of stroke identification (AUCROC: 0.9657, accuracy: 95.63 %) and mortality prediction (AUCROC: 0.7771, accuracy: 70.65 %). The SHAP and Gini importance analyses highlighted cardiovascular diseases as key factors for stroke prediction and age for mortality, with ALI being less influential.
Leveraging the nationally representative NHANES database, this exploratory analysis revealed that ALI presented a reverse dose-response association with the stroke risk and an "L-shaped" relationship with all-cause mortality among patients with stroke. Dual-interpretable RF models based on the ALI showed comparably promising potential among the six ML models for stroke identification and prognosis prediction.
Using data from the National Health and Nutrition Examination Survey (NHANES) 1999-2018, logistic regression and Cox regression assessed associations of ALI with stroke and mortality. Non-linear relationships were analysed using restricted cubic spline and subgroup stratification. Logistic regression (LR), extreme gradient boosting (XGBoost), random forest (RF), K-Nearest Neighbor (KNN), supported vector machine (SVM), and decision tree (DT) were developed for stroke and mortality prediction and evaluated using the area under the receiver operating characteristic curve (AUCROC), and metrics such as accuracy. Shapley additive explanations (SHAP) and Gini importance enhanced the dual-interpretability of the model.
Among the 46,451 participants, higher ALI was associated with a lower stroke risk, whereas mortality decreased with increasing ALI before stabilising at an inflection point (ALI = 40.91, Pthreshold < 0.001). Age stratification significantly modified the association between ALI and mortality. The RF model marginally outperformed the other models in terms of stroke identification (AUCROC: 0.9657, accuracy: 95.63 %) and mortality prediction (AUCROC: 0.7771, accuracy: 70.65 %). The SHAP and Gini importance analyses highlighted cardiovascular diseases as key factors for stroke prediction and age for mortality, with ALI being less influential.
Leveraging the nationally representative NHANES database, this exploratory analysis revealed that ALI presented a reverse dose-response association with the stroke risk and an "L-shaped" relationship with all-cause mortality among patients with stroke. Dual-interpretable RF models based on the ALI showed comparably promising potential among the six ML models for stroke identification and prognosis prediction.