Evaluating empirical calibration of P-values under unmeasured confounding bias: a simulation study and real-world application.

To evaluate whether empirical calibration of P-values using negative controls can effectively control Type I and Type II errors under unmeasured confounding bias in both simulated and real-world observational settings.

A simulation study was conducted under five settings reflecting different degrees of adherence to the U-comparability assumption-that is, the extent to which negative controls share the same unmeasured confounding structure as the exposure of interest. These included three primary scenarios (ideal, realistic, and violation of U-comparability) and two mixed scenarios reflecting partial violations. We varied sample size, the direction and strength of unmeasured confounding bias, and the number of negative controls. Based on UK Biobank data, the method was also applied to evaluate the association between hypertension and peripheral artery disease (PAD) in individuals with type 2 diabetes mellitus.

Standard logistic regression showed inflated Type I error rates across almost all settings, peaking at 44.2% under realistic U-comparability with a sample size of 20,000. In contrast, empirical calibration generally controlled Type I error close to the nominal 5% level and reduced bias by 80-100% under both ideal and realistic U-comparability. Type I error control improved with more negative controls, while Type II error control was influenced by whether the unmeasured confounding bias acted in the same or opposite direction as the true exposure-outcome effect. In the UK Biobank case study, 4 of 15 negative controls showed P < 0.05 after adjustment for measured confounders, indicating residual unmeasured confounding. After empirical calibration with 5, 10 or 15 negative controls, the association between hypertension and PAD remained statistically significant (calibrated P ≈ 0.004-0.006).

Empirical calibration of P-values can mitigate residual unmeasured confounding and reduce Type I error inflation in observational studies. Its performance depends on the validity and number of negative controls.
Diabetes
Diabetes type 2
Care/Management

Authors

Wang Wang, Li Li, Lu Lu, Wang Wang, Huang Huang, Chen Chen
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard