Geographic disparities and methodological quality of type 2 diabetes prediction models: a systematic review and meta-analysis of 97 models.

Accurate risk prediction is essential for targeted prevention of type 2 diabetes mellitus (T2DM). However, the global applicability and methodological rigor of existing prediction models remain uncertain.

To systematically review and meta-analyze the geographic distribution, methodological quality, and predictive performance of all published T2DM risk prediction models.

PubMed, Embase, Web of Science, Cochrane Library, CNKI, WanFang, and VIP were searched from inception to December 2025 (eAppendix S1 in the Supplement).

Studies that developed or validated a multivariable prediction model for incident T2DM in general adult populations and reported at least one performance measure.

Two reviewers independently extracted data and assessed risk of bias using the PROBAST tool. A random-effects meta-analysis was used to pool C-statistics. Heterogeneity was explored via subgroup analyses and meta-regression. The study followed TRIPOD-SRMA and PRISMA reporting guidelines.

The primary outcome was the geographic origin of models. Secondary outcomes included pooled measures of discrimination (C-statistic/AUC) stratified by region and an overall assessment of methodological quality (PROBAST).

A total of 65 studies comprising 97 distinct prediction models were included (eTable 1). Geographic distribution was highly skewed, with 70.1% of models developed in Asian populations (China: 47.4%; Japan: 13.4%; South Korea: 9.3%), while only 7.2% originated from the US and 4.1% from Europe. Logistic regression was used in 97.9% of models. External validation was performed for only 21 models (21.6%). According to PROBAST, 91.8% of models were at high risk of bias (eTable 2), primarily due to inadequate handling of missing data, lack of external validation, and poor calibration reporting. Meta-analysis revealed wide variation in discrimination by geographic region (eTable 7). US-based models achieved the highest pooled AUC (0.97; 95% CI, 0.94-0.99), but this finding is likely influenced by overfitting, small sample bias, and publication bias (see Discussion). European models showed a pooled AUC of 0.84 (0.81-0.87), while Chinese models showed the lowest performance (AUC, 0.79; 0.76-0.82). Due to very high heterogeneity (I² > 80% in most regions), these pooled estimates should be interpreted as descriptive summaries rather than precise estimates of true regional performance. Performance was lowest in prediabetic cohorts (AUC, 0.72; 0.68-0.76); however, this finding is preliminary due to the limited number of models and high heterogeneity. Funnel plot asymmetry suggested potential publication bias (Egger's test p=0.03); The most frequently included predictors were age (69.1%), body mass index (64.9%), family history of diabetes (44.3%), and waist circumference (39.2%) (eFigure 4 and eTable 3).

T2DM prediction models exhibit striking geographic inequity and poor methodological quality, with over 90% at high risk of bias. The substantial variation in performance by region and the lack of external validation critically limit their global clinical utility. These findings underscore an urgent need for rigorous external validation in diverse populations and de novo model development in under-represented regions, guided by PROBAST and TRIPOD standards.

Not applicable.

Not applicable.
Diabetes
Diabetes type 2
Care/Management

Authors

Queiroz Queiroz, Gadelha Gadelha, Husain Husain
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard