Interpretable miRNA-based prediction model for early detection of pancreatic cancer: Development and cross-platform validation.
Pancreatic cancer remains one of the most lethal malignancies, largely due to delayed diagnosis. Although microRNA (miRNA) biomarkers show promise, many previous studies lack cross-platform validation and model interpretability, limiting clinical applicability.
We developed and externally validated an interpretable diagnostic model based on a 20-miRNA signature using publicly available datasets. A total of 801 samples were included, of which 767 were used for model training and validation. The training cohort comprised GSE59856 and GSE85589 (n = 216), and independent validation cohorts included TCGA-PAAD and GTEx pancreas (n = 585), with additional serum-based validation (GSE128508; n = 30). Feature selection and model development were conducted exclusively within the training cohort. A Random Forest classifier was applied, and model interpretability was assessed using SHAP analysis. Diagnostic performance was evaluated using cross-validation and independent external validation.
The model achieved a cross-validation AUC of 0.87 (95% CI 0.82-0.92), with sensitivity of 84.7% and specificity of 83.1% in the training cohort. External validation across independent RNA-seq and qRT-PCR datasets demonstrated AUC values ranging from 0.78 to 0.83. Performance remained broadly consistent across sample types and platforms. SHAP analysis identified miR-6875-5p, miR-196a-5p, and miR-1246 among the principal contributors to classification. Functional enrichment analysis suggested involvement in canonical cancer-related pathways.
We developed and externally validated an interpretable 20-miRNA signature for pancreatic cancer diagnosis with consistent performance across independent cohorts. Although based on retrospective datasets, the structured validation strategy and explainable modeling framework provide a transparent foundation for future prospective evaluation.
We developed and externally validated an interpretable diagnostic model based on a 20-miRNA signature using publicly available datasets. A total of 801 samples were included, of which 767 were used for model training and validation. The training cohort comprised GSE59856 and GSE85589 (n = 216), and independent validation cohorts included TCGA-PAAD and GTEx pancreas (n = 585), with additional serum-based validation (GSE128508; n = 30). Feature selection and model development were conducted exclusively within the training cohort. A Random Forest classifier was applied, and model interpretability was assessed using SHAP analysis. Diagnostic performance was evaluated using cross-validation and independent external validation.
The model achieved a cross-validation AUC of 0.87 (95% CI 0.82-0.92), with sensitivity of 84.7% and specificity of 83.1% in the training cohort. External validation across independent RNA-seq and qRT-PCR datasets demonstrated AUC values ranging from 0.78 to 0.83. Performance remained broadly consistent across sample types and platforms. SHAP analysis identified miR-6875-5p, miR-196a-5p, and miR-1246 among the principal contributors to classification. Functional enrichment analysis suggested involvement in canonical cancer-related pathways.
We developed and externally validated an interpretable 20-miRNA signature for pancreatic cancer diagnosis with consistent performance across independent cohorts. Although based on retrospective datasets, the structured validation strategy and explainable modeling framework provide a transparent foundation for future prospective evaluation.