Machine learning based on clinical and gene expression data assists in survival prediction and treatment optimization for diffuse large B-Cell lymphoma patients.
Diffuse large B-cell lymphoma (DLBCL) is an aggressive and common subtype of non-Hodgkin lymphoma (NHL). Despite the availability of several risk stratification tools, substantial room for improvement in personalized prognostic prediction still exists. Furthermore, considering the heterogeneity of DLBCL, how to select an appropriate treatment in a personalized manner remains a clinical challenge. In this study, we developed a random survival forests model by integrating clinical and gene expression data from 677 DLBCL case in Gene Expression Omnibus (GEO) database. Our model predicted overall survival with high concordance between training and validation datasets (C-index: 0.832 and 0.758, respectively), outperforming the consistency predicted by common prognostic markers such as Cell-Of-Origin Subtype, IPI score and Ann Arbor stage. Time-dependent ROC curves also showed good predictive performance for 1-year, 3-year, and 5-year survival in training and validation cohorts, the models are accessible via an open-access website. Survival analysis demonstrated that the group receiving the optimal treatment showed a more favorable survival association. Furthermore, we also used Kaplan-Meier curves, multivariate analysis and penalized Cox regression model to identify six genes (C2CD5, CD163, JADE3, BIRC3, TMEM200A, and LINC00877) related to the prognosis of DLBCL. In conclusion, we developed a machine learning model integrating clinical characteristics and gene expression profiles, providing a reliable decision-support tool for DLBCL prognosis and treatment selection.
Authors
Lin Lin, Lv Lv, Cai Cai, Nie Nie, Zeng Zeng, Lin Lin, Lin Lin, Wen Wen, Li Li, Su Su
View on Pubmed