Large language models for psychosocial risk assessment: A multi-method evaluation across suicide, intimate partner violence, and substance misuse.

Psychosocial risk assessment is a cornerstone of mental health care, yet remains resource-intensive and inconsistently delivered across domains such as suicide, intimate partner violence (IPV), and substance misuse. Recent advances in large language models (LLMs) raise the possibility of scalable, conversational agents capable of detecting and evaluating psychosocial risk. Across three interlinked studies, we evaluated the performance of LLMs in this context. Study 1 benchmarked GPT-4 and Claude 3 Sonnet against vignettes constructed from participants' lived-experience, finding high accuracy in detecting risk domains and substantial agreement with participant-rated severity, though suicidality proved more challenging than IPV or substance misuse. Study 2 examined participants' perceptions of LLM-generated responses, revealing that most judged them accurate, empathic, and clinically useful, with no differences across models or domains. Study 3 implemented a supervised, three-agent GPT-4o-based chatbot system integrating one chatbot as a therapeutic agent, a supervisor for risk detection, and a JSON-based assessor for structured evaluation. The therapeutic agent chatbot was successfully completed full risk assessments most of the time while maintaining therapeutic quality. Together, these studies suggest that LLMs can contribute to psychosocial risk detection and structured assessment under controlled conditions, while underscoring the need for careful supervision, rigorous validation, and clearly defined boundaries before consideration of real-world clinical deployment.
Mental Health
Care/Management

Authors

Vowels Vowels, Vohra Vohra, Li Li, Zeinoddin Zeinoddin, Elswick Elswick, Marcantonio Marcantonio, Wood Wood, Vowels Vowels
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard