CFRAFN: A Cross-Feature Residual Attention Fusion Network for Major Depressive Disorder Prediction Using Clinical Voice Recordings.

Major depressive disorder (MDD) is a prevalent mental disorder with a significant burden on individuals and society, and timely identification and intervention are essential for effective management. Voice data have been used as behavioral indicators of MDD, offering valuable insights into an individual's mental state. In this study, we collected voice data from 221 patients diagnosed with MDD at the inpatient ward of the Department of Psychiatry and Psychosomatics, Zhongda Hospital, Southeast University, alongside 113 healthy controls, to construct the Chinese depressive voice dataset. We proposed the cross-feature residual attention fusion network (CFRAFN), which leverages extended Geneva minimalistic acoustic parameter set features along with high-dimensional embeddings extracted from the pretrained VGGish model to effectively capture MDD-associated phonetic patterns. Specifically, CFRAFN utilizes differentiated residual blocks to maintain training stability in deep hierarchical structure. Furthermore, the self-attention fusion strategy dynamically weighted the significance of each feature modality, ensuring effective feature integration and consequently improving MDD prediction accuracy. Experimental results demonstrated that CFRAFN achieved an excellent predictive performance with an area under the receiver operating characteristic curve of 0.924 in an independent test set, and significantly outperformed 11 baseline models across 5-fold cross-validation.
Mental Health
Care/Management

Authors

Pan Pan, Feng Feng, Sun Sun, Xu Xu, Zhai Zhai, Wu Wu, Tan Tan, Yuan Yuan, Cao Cao, Maimaitimin Maimaitimin, Yao Yao, Guo Guo, Xuan Xuan, Li Li, Xu Xu
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard