The XGBoost Model Versus the Logistic Regression Model Created Based on Serum Markers in Predicting the Risk of Post-Stroke Cognitive Impairment Following Acute Ischemic Stroke.
Acute ischemic stroke is a major cause of cognitive dysfunction. Early identification of post-stroke cognitive impairment (PSCI) is crucial for improving patient prognosis. While there has been extensive research on prognostic models for acute ischemic stroke, the selection of predictive factors remains heavily reliant on neuroimaging parameters. This study aims to create and compare the eXtreme gradient boosting (XGBoost) and logistic regression (LR) models based on serum biomarkers for predicting the risk of PSCI following acute ischemic stroke.
The study enrolled 261 adult patients with acute ischemic stroke within 7 days of onset. Their baseline characteristics, serum markers, and scores anthe National Institutes of Health Stroke Scale (NIHSS) and the Montreal Cognitive Assessment (MoCA) were collected. Cognitive function assessment was completed 3 months (±2 weeks) after stroke, with PSCI diagnosis based on a MoCA score < 26. Patients were randomly assigned to the training dataset (n = 183) and testing dataset (n = 78) in a ratio of 7:3. Significant features for predicting the risk of PSCI were selected via LassoCV in R. The accuracy, F1 score, Cohen's kappa, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were measured to assess the accuracy of the XGBoost and LR prediction models. Finally, the performance of the optimal prediction model was evaluated by SHapley additive exPlanations (SHAP) beeswarm and force plots.
The incidence of PSCI and other baseline characteristics were comparable between the training and testing datasets (all P > 0.05). Vascular endothelial cadherin (VE-Cad), NIHSS score, age, drink history, C-reactive protein (CRP), and education years were features associated with the risk of PSCI. The XGBoost model was superior in accuracy, F1 score and sensitivity in predicting the risk of PSCI than the LR model. Beeswarm and force plots displayed the excellent ability of the XGBoost model in predicting the risk of PSCI in patients with acute ischemic stroke.
Based on serum biomarkers, the XGBoost model can accurately predict the risk of PSCI in patients with acute ischemic stroke, with superior performance than the LR model, and may serve as a reliable tool for early identification to improve the diagnosis.From 261 acute ischemic stroke patients (training n = 183, testing n = 78), we collected demographic data, cognitive assessments, and serum indicators. LassoCV identified sensitive predictors including VE-Cad, NIHSS score, CRP, age, drinking history, and education years. The XGBoost model demonstrated superior performance over LR in predicting PSCI risk. SHAP analysis revealed how these variables influenced model predictions. Based on serum biomarkers, the XGBoost model accurately predicts PSCI risk and may serve as a reliable tool for early identification to improve diagnosis.
The study enrolled 261 adult patients with acute ischemic stroke within 7 days of onset. Their baseline characteristics, serum markers, and scores anthe National Institutes of Health Stroke Scale (NIHSS) and the Montreal Cognitive Assessment (MoCA) were collected. Cognitive function assessment was completed 3 months (±2 weeks) after stroke, with PSCI diagnosis based on a MoCA score < 26. Patients were randomly assigned to the training dataset (n = 183) and testing dataset (n = 78) in a ratio of 7:3. Significant features for predicting the risk of PSCI were selected via LassoCV in R. The accuracy, F1 score, Cohen's kappa, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were measured to assess the accuracy of the XGBoost and LR prediction models. Finally, the performance of the optimal prediction model was evaluated by SHapley additive exPlanations (SHAP) beeswarm and force plots.
The incidence of PSCI and other baseline characteristics were comparable between the training and testing datasets (all P > 0.05). Vascular endothelial cadherin (VE-Cad), NIHSS score, age, drink history, C-reactive protein (CRP), and education years were features associated with the risk of PSCI. The XGBoost model was superior in accuracy, F1 score and sensitivity in predicting the risk of PSCI than the LR model. Beeswarm and force plots displayed the excellent ability of the XGBoost model in predicting the risk of PSCI in patients with acute ischemic stroke.
Based on serum biomarkers, the XGBoost model can accurately predict the risk of PSCI in patients with acute ischemic stroke, with superior performance than the LR model, and may serve as a reliable tool for early identification to improve the diagnosis.From 261 acute ischemic stroke patients (training n = 183, testing n = 78), we collected demographic data, cognitive assessments, and serum indicators. LassoCV identified sensitive predictors including VE-Cad, NIHSS score, CRP, age, drinking history, and education years. The XGBoost model demonstrated superior performance over LR in predicting PSCI risk. SHAP analysis revealed how these variables influenced model predictions. Based on serum biomarkers, the XGBoost model accurately predicts PSCI risk and may serve as a reliable tool for early identification to improve diagnosis.