Comparing computable type 2 diabetes phenotype definitions in identifying populations of interest for clinical research.
Significant variations exist in computable phenotype definitions to identify patients with type 2 diabetes (T2D) using electronic health records (EHRs). These variations cause challenges in identifying T2D populations for clinical research. To address these challenges, this study compares the variations in common phenotypes in identifying patients with T2D using EHRs.
A retrospective data analysis was performed using clinical data extracted from EHRs of 207 813 adult patients captured 2017-2019. Multiple T2D phenotypes were used: (1) Surveillance, Prevention and Management of Diabetes Mellitus, (2) Centers for Medicare and Medicaid Services Chronic Conditions Data Warehouse (CCW), (3) eMERGE Northwestern Group, (4) Durham Diabetes Coalition (DDC) and (5) a definition developed by a panel of experts at Johns Hopkins.
Each phenotype definition identified a different T2D population with a unique composition of demographics and clinical features. Although the identified patients overlapped across phenotypes, only 22.7% (47 326) of the population was commonly identified across all definitions. Of the phenotypes, DDC identified the greatest number of patients with T2D (139 832, 67.3%), while CCW had the highest mean age (65.3 years), the highest percentage of black patients (35%) and the highest mean Charlson comorbidity score of 2.96. DDC identified patients with T2D with the lowest means of inpatient (0.64) and emergency room (1.06) visits.
Our study highlights the complexity of computable T2D phenotypes in translating commonly agreed T2D clinical definitions when applied against retrospective EHR data. Our findings provide an understanding of using appropriate phenotypes to identify, enrol and analyse T2D populations of interest using EHR data.
A retrospective data analysis was performed using clinical data extracted from EHRs of 207 813 adult patients captured 2017-2019. Multiple T2D phenotypes were used: (1) Surveillance, Prevention and Management of Diabetes Mellitus, (2) Centers for Medicare and Medicaid Services Chronic Conditions Data Warehouse (CCW), (3) eMERGE Northwestern Group, (4) Durham Diabetes Coalition (DDC) and (5) a definition developed by a panel of experts at Johns Hopkins.
Each phenotype definition identified a different T2D population with a unique composition of demographics and clinical features. Although the identified patients overlapped across phenotypes, only 22.7% (47 326) of the population was commonly identified across all definitions. Of the phenotypes, DDC identified the greatest number of patients with T2D (139 832, 67.3%), while CCW had the highest mean age (65.3 years), the highest percentage of black patients (35%) and the highest mean Charlson comorbidity score of 2.96. DDC identified patients with T2D with the lowest means of inpatient (0.64) and emergency room (1.06) visits.
Our study highlights the complexity of computable T2D phenotypes in translating commonly agreed T2D clinical definitions when applied against retrospective EHR data. Our findings provide an understanding of using appropriate phenotypes to identify, enrol and analyse T2D populations of interest using EHR data.