Facial Expression-Based Evaluation of the Emotion Estimation Software Kokoro Sensor in Healthy Individuals: Validation and Reliability Pilot Study.
In recent years, artificial intelligence (AI) systems have increasingly been used to assess emotional states in health care. AI offers a safe, quick, user-friendly, and objective emotional evaluation method. However, evidence supporting its implementation in health care remains limited.
This study aimed to explore the concurrent validity and test-retest reliability of emotion recognition AI based on facial expressions.
In this study, we used the Kokoro Sensor, an accurate and widely recognized automated facial expression recognition system. The Japanese version of the Profile of Mood States-Short Form was used to screen the potential influence of mental states on facial expressions. The study participants made positive, negative, and neutral expressions, which were analyzed by the emotion recognition AI. Agreement between the results of the AI and subjective evaluations was assessed by participants and a researcher using a 4-point Likert-type scale. The facial expressions and emotion analysis process were repeated after a 30-minute interval to investigate reliability. Concurrent validity was evaluated using the content validity index (CVI) and κ coefficient, and test-retest reliability was determined using the κ coefficient.
The study participants were 40 individuals whose mental states did not deviate from the reference range of the Profile of Mood States manual. Among the participants, the CVI values for positive, neutral, and negative expressions were 95%, 98%, and 85%, respectively. Among the researchers, the corresponding CVI values were 100%, 100%, and 70%, respectively. The overall weighted κ coefficient was 0.55 (CI 0.44-0.67), indicating moderate agreement. The agreement was almost perfect for distinguishing positive from neutral expressions (κ=0.83, 95% CI 0.70-0.95) but not statistically significant for distinguishing negative from neutral expressions (κ=0.15, 95% CI -0.07 to 0.37). Test-retest reliability analysis showed an overall weighted κ coefficient of 0.66, reflecting substantial reliability. Almost perfect agreement was observed for distinguishing positive from neutral expressions (κ=0.85, 95% CI 0.73-0.97), while distinguishing negative from neutral expressions showed limited reliability (κ=0.36, 95% CI 0.16-0.57).
Our findings suggest that the Kokoro Sensor may be useful for identifying positive affect, given its acceptable concurrent validity for overall valence estimation and its high agreement for distinguishing positive from neutral expressions. However, concurrent validity for negative expressions did not meet the prespecified benchmark based on the researcher's ratings, and agreement for distinguishing negative from neutral expressions was limited, which may constrain clinical utility for detecting negative affect. Therefore, in clinical settings, the Kokoro Sensor should be used as an assistive tool rather than a stand-alone method.
This study aimed to explore the concurrent validity and test-retest reliability of emotion recognition AI based on facial expressions.
In this study, we used the Kokoro Sensor, an accurate and widely recognized automated facial expression recognition system. The Japanese version of the Profile of Mood States-Short Form was used to screen the potential influence of mental states on facial expressions. The study participants made positive, negative, and neutral expressions, which were analyzed by the emotion recognition AI. Agreement between the results of the AI and subjective evaluations was assessed by participants and a researcher using a 4-point Likert-type scale. The facial expressions and emotion analysis process were repeated after a 30-minute interval to investigate reliability. Concurrent validity was evaluated using the content validity index (CVI) and κ coefficient, and test-retest reliability was determined using the κ coefficient.
The study participants were 40 individuals whose mental states did not deviate from the reference range of the Profile of Mood States manual. Among the participants, the CVI values for positive, neutral, and negative expressions were 95%, 98%, and 85%, respectively. Among the researchers, the corresponding CVI values were 100%, 100%, and 70%, respectively. The overall weighted κ coefficient was 0.55 (CI 0.44-0.67), indicating moderate agreement. The agreement was almost perfect for distinguishing positive from neutral expressions (κ=0.83, 95% CI 0.70-0.95) but not statistically significant for distinguishing negative from neutral expressions (κ=0.15, 95% CI -0.07 to 0.37). Test-retest reliability analysis showed an overall weighted κ coefficient of 0.66, reflecting substantial reliability. Almost perfect agreement was observed for distinguishing positive from neutral expressions (κ=0.85, 95% CI 0.73-0.97), while distinguishing negative from neutral expressions showed limited reliability (κ=0.36, 95% CI 0.16-0.57).
Our findings suggest that the Kokoro Sensor may be useful for identifying positive affect, given its acceptable concurrent validity for overall valence estimation and its high agreement for distinguishing positive from neutral expressions. However, concurrent validity for negative expressions did not meet the prespecified benchmark based on the researcher's ratings, and agreement for distinguishing negative from neutral expressions was limited, which may constrain clinical utility for detecting negative affect. Therefore, in clinical settings, the Kokoro Sensor should be used as an assistive tool rather than a stand-alone method.