Predicting survival of Hodgkin lymphoma using machine learning-an analysis based on the SEER database.

This study aimed to develop an effective model for predicting Hodgkin lymphoma (HL) prognosis as to assist clinicians in making optimal clinical decisions.

This study screened HL patients from the Surveillance, Epidemiology, and End Results (SEER) database from 2000 to 2021. Feature selection was performed using the Boruta algorithm. Four ML models were built based on the feature selection algorithm. The area under the curve (AUC), decision curve analysis, and Brier score were employed to evaluate the reliability of the four ML models. The feature importance was ranked through the Shapley Additive Explanation (SHAP). Based on the results of the SHAP plot, Kaplan-Meier analysis was used to compare the survival probabilities among different groups.

Among the 11,259 enrolled HL patients, 8,928 were alive and 2,331 had died. Primary site, year of diagnosis, B symptoms, surgery, marital status, Ann Arbor stage, radiation, SEER stage, chemotherapy, delay (diagnosis to treatment), age were associated with HL overall survival (OS). Of four ML models, the eXtreme Gradient Boosting (XGBoost) model exhibited superior predictive performance. For predicting 1-year OS, the net benefit of XGBoost, Cox proportional hazards (Coxph), and Random Survival Forest (RSF) models was significantly higher than that of the Light Gradient Boosting Machine (LightGBM) model, the treat-all model, and the treat-none model. Age, Ann Arbor stage, B symptoms, marital status, and radiation were the top five indicators in the feature importance ranking for HL OS.

The XGBoost had excellent predictive performance in the prognostic model, which further helps clinicians to select appropriate treatment options.

Not applicable.
Cancer
Access
Care/Management
Advocacy

Authors

Cai Cai, Kang Kang
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard