Virtual reality-induced emotion recognition with deep learning-based multimodal physiological feature fusion.
Recognizing emotions objectively and accurately remains challenging because of the limited ecological validity, informational incompleteness, and constrained model performance of conventional approaches. This study addresses these limitations holistically by investigating a novel framework that integrates ecologically valid virtual reality (VR) for emotion elicitation with deep learning-based multimodal physiological signal fusion.
An immersive VR environment was developed to effectively elicit three target emotional states: positive, neutral, and negative. Synchronized physiological signals-electroencephalography (EEG), electrocardiography (ECG), and galvanic skin response (GSR)-were recorded from 20 healthy participants alongside subjective self-assessment data. After preprocessing and feature extraction, a nested cross-validation procedure was employed to prevent data leakage: within each of the five folds, feature selection (one-way repeated-measures ANOVA, α = 0.05) was performed solely on the training data. A hybrid network architecture combining principal component analysis (PCA) with long short-term memory (LSTM) was employed for dimensionality reduction and modeling. The PCA retained components explaining 90% cumulative variance, while the LSTM layer contained 96 hidden units, followed by three fully connected layers with integrated dropout regularization. Model performance was evaluated using this rigorous cross-validation framework and compared against baseline models including support vector machine (SVM), random forest (RF), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost).
Subjective evaluation results confirmed the effectiveness of VR-induced emotion elicitation. At the group level, one-way repeated-measures analysis of variance revealed significant main effects of emotional states (p < 0.05) on multiple physiological features: EEG frontal alpha asymmetry indices (AI_F4/F3, AI_F8/F7), ECG indices (SDNN, RMSSD, LF/HF ratio, sample entropy), and GSR measures (SCL, NS.SCRs). Employing a nested cross-validation framework to prevent data leakage, the PCA-LSTM model achieved a mean accuracy of 87.18% ± 2.28% under five-fold cross-validation, significantly outperforming SVM (75.83% ± 4.25%), RF (78.89% ± 6.85%), k-NN (72.78% ± 5.21%), and XGBoost (81.67% ± 5.83%).
This study validates that integrating an ecologically valid VR emotion elicitation paradigm with a multimodal PCA-LSTM fusion model effectively enhances the objectivity and accuracy of emotion recognition. The proposed framework provides an effective solution to overcome the bottlenecks of ecological validity and quantification precision in traditional methods, demonstrating preliminary application potential in intelligent human-computer interaction and mental-health monitoring domains.
An immersive VR environment was developed to effectively elicit three target emotional states: positive, neutral, and negative. Synchronized physiological signals-electroencephalography (EEG), electrocardiography (ECG), and galvanic skin response (GSR)-were recorded from 20 healthy participants alongside subjective self-assessment data. After preprocessing and feature extraction, a nested cross-validation procedure was employed to prevent data leakage: within each of the five folds, feature selection (one-way repeated-measures ANOVA, α = 0.05) was performed solely on the training data. A hybrid network architecture combining principal component analysis (PCA) with long short-term memory (LSTM) was employed for dimensionality reduction and modeling. The PCA retained components explaining 90% cumulative variance, while the LSTM layer contained 96 hidden units, followed by three fully connected layers with integrated dropout regularization. Model performance was evaluated using this rigorous cross-validation framework and compared against baseline models including support vector machine (SVM), random forest (RF), k-nearest neighbors (k-NN), and extreme gradient boosting (XGBoost).
Subjective evaluation results confirmed the effectiveness of VR-induced emotion elicitation. At the group level, one-way repeated-measures analysis of variance revealed significant main effects of emotional states (p < 0.05) on multiple physiological features: EEG frontal alpha asymmetry indices (AI_F4/F3, AI_F8/F7), ECG indices (SDNN, RMSSD, LF/HF ratio, sample entropy), and GSR measures (SCL, NS.SCRs). Employing a nested cross-validation framework to prevent data leakage, the PCA-LSTM model achieved a mean accuracy of 87.18% ± 2.28% under five-fold cross-validation, significantly outperforming SVM (75.83% ± 4.25%), RF (78.89% ± 6.85%), k-NN (72.78% ± 5.21%), and XGBoost (81.67% ± 5.83%).
This study validates that integrating an ecologically valid VR emotion elicitation paradigm with a multimodal PCA-LSTM fusion model effectively enhances the objectivity and accuracy of emotion recognition. The proposed framework provides an effective solution to overcome the bottlenecks of ecological validity and quantification precision in traditional methods, demonstrating preliminary application potential in intelligent human-computer interaction and mental-health monitoring domains.