Utilizing synthetic data for privacy-preserving AI modeling in radiomics: a case study.

Preserving the privacy of AI models in healthcare is critical due to the sensitive nature of patient data, particularly in radiomics which provide a unique "signature" that can potentially re-identify individuals. This study explores synthetic radiomics data as a privacy-preserving strategy for AI-driven prostate cancer aggressiveness classification. Radiomics were extracted from 4,588 retrospective and 1,369 prospective Multiparametric MRI (mpMRI) data across 12 EU centers. Three advanced generators were explored, including: (i) a custom version of the Bayesian Gaussian Mixture Models with optimal components estimation (i.e., the BGMMOCE), (ii) the Conditional Tabular Generative Adversarial Network (CTGAN), and (iii) the Tabular Variational AutoEncoder (TVAE). Data fidelity was assessed using statistical measures like the Jensen-Shannon divergence (JSD), and the Hellinger distance (HD). Based on our findings, the BGMMOCE achieved increased fidelity (e.g. JSD 0.08, HD 0.23) compared to the rest. A Random Forest (RF) classifier trained on the BGMMOCE-generated data, and tested on real prospective cases, showed comparable performance with the AI model trained only on the real data, highlighting the balance between fidelity and privacy.Clinical Relevance-This work addresses the challenge of preserving patient privacy in AI-driven radiomics analysis by leveraging synthetic data while maintaining model performance.
Cancer
Access
Care/Management
Advocacy

Authors

Pezoulas Pezoulas, Mylona Mylona, Tachos Tachos, Zaridis Zaridis, Apostolidis Apostolidis, Papanikolaou Papanikolaou, Regge Regge, Marias Marias, Tsiknakis Tsiknakis, Fotiadis Fotiadis
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard