Transcriptomic and Machine Learning-Based Classification of Unprovoked Versus Provoked Venous Thromboembolism Using Public Data.
Venous thromboembolism (VTE) comprises provoked and unprovoked forms; accurate classification informs anticoagulation duration and recurrence risk but is limited by clinical phenotyping. We analysed the GSE48000 whole-blood transcriptomes to identify differentially expressed genes (DEGs) between provoked and unprovoked VTE. DEGs underwent GO and KEGG enrichment. Random forest ranked features, and an artificial neural network (ANN) built on the top 30 genes was trained and evaluated discrimination using stratified 10-fold cross-validation with receiver operating characteristic (ROC) analysis. A 30-gene signature cleanly separated the two subtypes. Most genes showed lower expression in unprovoked VTE, with a notable upregulation of GDF2, LGALS2, and LOC100130229. Enrichment analyses highlighted immune regulation and vesicle-transport pathways. The ANN achieved an AUC of 0.799 in this dataset. Transcriptomic profiling coupled with machine learning distinguished provoked from unprovoked VTE with excellent discrimination, supporting the feasibility of artificial intelligence (AI)-based molecular diagnostics for classification and risk assessment. Prospective validation in larger, independent cohorts is warranted.