Development and validation of a risk stratification model for sarcopenia in patients with chronic lung disease: a cross-sectional study based on CHARLS data.
The aim of this study was to develop a machine learning-based stratification model to identify high-risk individuals for sarcopenia among patients with chronic lung disease (CLD), thereby facilitating early personalised management of this complication.
We included 1833 complete patient records with CLD diagnoses from the China Health and Retirement Longitudinal Study dataset, comprising 388 sarcopenia cases and 1445 non-sarcopenia controls. 17 variables were collected, including demographic characteristics (age, gender, waist circumference, education level), lifestyle factors and chronic comorbidities. Data were split into training and test sets (7:3 ratio). Variables were screened using Least Absolute Shrinkage and Selection Operator (LASSO) regression, and six machine learning algorithms were employed to construct and validate stratification models, with performance evaluated through multiple metrics. Temporal validation (n=1205) and SHapley Additive exPlanations analysis ensured robustness and interpretability.
All six machine learning algorithms demonstrated excellent performance in both the training and test sets, as evidenced by receiver operating characteristic curve analysis. Among them, eXtreme Gradient Boosting achieved the highest overall performance (area under the curve=0.93). The feature importance analysis identified waist circumference, age and gender as the three most significant predictors of sarcopenia in patients with CLD.
This study developed an interpretable machine learning-based risk stratification model for sarcopenia in patients with CLD. The model may serve as a novel clinical tool to support early personalised interventions and improve patient prognosis.
We included 1833 complete patient records with CLD diagnoses from the China Health and Retirement Longitudinal Study dataset, comprising 388 sarcopenia cases and 1445 non-sarcopenia controls. 17 variables were collected, including demographic characteristics (age, gender, waist circumference, education level), lifestyle factors and chronic comorbidities. Data were split into training and test sets (7:3 ratio). Variables were screened using Least Absolute Shrinkage and Selection Operator (LASSO) regression, and six machine learning algorithms were employed to construct and validate stratification models, with performance evaluated through multiple metrics. Temporal validation (n=1205) and SHapley Additive exPlanations analysis ensured robustness and interpretability.
All six machine learning algorithms demonstrated excellent performance in both the training and test sets, as evidenced by receiver operating characteristic curve analysis. Among them, eXtreme Gradient Boosting achieved the highest overall performance (area under the curve=0.93). The feature importance analysis identified waist circumference, age and gender as the three most significant predictors of sarcopenia in patients with CLD.
This study developed an interpretable machine learning-based risk stratification model for sarcopenia in patients with CLD. The model may serve as a novel clinical tool to support early personalised interventions and improve patient prognosis.
Authors
Pang Pang, Xu Xu, Zhong Zhong, Huang Huang, Xian Xian, Yang Yang, Lin Lin, Pang Pang, Chen Chen, Miao Miao, Wang Wang, Chen Chen, Sun Sun, Sun Sun
View on Pubmed