Explainable prediction of MDR/RR-TB in tuberculosis-diabetes mellitus multimorbidity: a machine learning model developed and validated in a dual-center study.
Tuberculosis-diabetes mellitus (TB-DM) multimorbidity significantly increases the risk of multidrug-resistant/rifampicin-resistant tuberculosis (MDR/RR-TB). Early risk stratification tools for this high-risk population remain lacking.
To develop and validate an interpretable machine learning (ML) model for predicting MDR/RR-TB in patients with TB-DM multimorbidity, and to identify key predictive factors using explainable artificial intelligence.
This dual-center retrospective study enrolled 245 patients with TB-DM multimorbidity from January 2019 to December 2022. Seven machine learning algorithms were constructed and validated with 10-fold cross-validation. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, precision, recall, F1-score, calibration curve, and decision curve analysis (DCA). SHapley Additive exPlanations (SHAP) was applied to identify critical predictive factors.
The random forest (RF) model achieved the optimal performance, with an AUC-ROC of 0.818, accuracy of 0.806, precision of 0.688, recall of 0.611, and F1-score of 0.647; the moderate recall indicates a considerable false-negative rate (FNR) , supporting its use as a triage tool rather than a stand-alone diagnostic test. Calibration and DCA confirmed robust predictive reliability and substantial clinical net benefit within a clinically relevant threshold range of 0.06-0.80. SHAP analysis identified the symptom-to-diagnosis interval, tuberculosis (TB) treatment history, treatment adherence, pulmonary cavitation, and smoking history as the top five critical predictors.
The interpretable RF model accurately and reliably predicts the risk of MDR/RR-TB in patients with TB-DM multimorbidity. The symptom-to-diagnosis interval is the most crucial risk factor. This model can assist clinical triage, early intervention, and personalized management.
To develop and validate an interpretable machine learning (ML) model for predicting MDR/RR-TB in patients with TB-DM multimorbidity, and to identify key predictive factors using explainable artificial intelligence.
This dual-center retrospective study enrolled 245 patients with TB-DM multimorbidity from January 2019 to December 2022. Seven machine learning algorithms were constructed and validated with 10-fold cross-validation. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC), accuracy, precision, recall, F1-score, calibration curve, and decision curve analysis (DCA). SHapley Additive exPlanations (SHAP) was applied to identify critical predictive factors.
The random forest (RF) model achieved the optimal performance, with an AUC-ROC of 0.818, accuracy of 0.806, precision of 0.688, recall of 0.611, and F1-score of 0.647; the moderate recall indicates a considerable false-negative rate (FNR) , supporting its use as a triage tool rather than a stand-alone diagnostic test. Calibration and DCA confirmed robust predictive reliability and substantial clinical net benefit within a clinically relevant threshold range of 0.06-0.80. SHAP analysis identified the symptom-to-diagnosis interval, tuberculosis (TB) treatment history, treatment adherence, pulmonary cavitation, and smoking history as the top five critical predictors.
The interpretable RF model accurately and reliably predicts the risk of MDR/RR-TB in patients with TB-DM multimorbidity. The symptom-to-diagnosis interval is the most crucial risk factor. This model can assist clinical triage, early intervention, and personalized management.
Authors
Zhong Zhong, Liu Liu, Zhang Zhang, Tang Tang, Lu Lu, Chen Chen, Pang Pang, Chen Chen, Li Li, Ding Ding, Ma Ma
View on Pubmed