Data-driven biomarker discovery and risk profiling for polycystic ovary syndrome in Indian women using ensemble learning.

Despite diagnostic advancements in India, the scarcity of Indian polycystic ovary syndrome (PCOS) data and varied diagnostic standards contribute to delays in PCOS detection, particularly in rural areas.

We aim to build a predictive model based on an extensive dataset derived from Indian studies and perform risk-based stratification of samples.

The PubMed database was queried for studies focused on the pathophysiology of PCOS in Indian women. Based on inclusion and exclusion criteria, six studies were selected. Corresponding clinical data was statistically synthesised based on study-specific baseline characteristics. The integrated dataset consisted of 11,258 samples (nPCOS = 7342; nControl = 3916) with 14 attributes: disease (PCOS vs control), age, body mass index, cholesterol, triglycerides, high-density and low-density lipoproteins, LH, FSH, testosterone, menarche age, systolic and diastolic blood pressure, and yearly menstrual cycles. After data pre-processing, missing values imputation, and feature engineering, model benchmarking was conducted using LazyPredict. LightGBM was selected for further hyperparameter tuning based on performance metrics. Lastly, feature importance analysis was performed, and predictive probabilities were utilised to categorise samples into different risk categories.

The optimised LightGBM model achieved 96.18% accuracy, 97.51% precision, 96.65% recall, and 99.31% receiver operating characteristic area under curve (ROC-AUC) score. Further, testosterone, menstrual cycles per year, triglycerides, LH, and diastolic blood pressure were the top five key attributes in PCOS. Risk categorisation of samples demonstrated substantial alignment with real diagnoses, validating the model's clinical significance.

This study introduces the first comprehensively synthesised PCOS dataset for Indian women.

Our framework facilitates prompt risk detection, providing an adaptable methodology for decision-making in PCOS management.
Cancer
Access
Care/Management
Advocacy
Education

Authors

Verma Verma, Agarwal Agarwal, Sharma Sharma, Sindwani Sindwani
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard