Augmenting large language models to predict social determinants of mental health in opioid use disorder using patient clinical notes.
Identifying social determinants of mental health (SDOMH) in patients with opioid use disorder (OUD) is crucial for estimating risk and enabling early intervention. Extracting such data from unstructured clinical notes is challenging due to annotation complexity and requires advanced natural language processing (NLP) techniques. We propose the Human-in-the-Loop Large Language Model Interaction for Annotation (HLLIA) framework, combined with a Multilevel Hierarchical Clinical-Longformer Embedding (MHCLE) algorithm, to annotate and predict SDOMH variables.
We utilized 2636 annotated discharge summaries from the Medical Information Mart for Intensive Care (MIMIC-IV) dataset. High-quality annotations were ensured via a human-in-the-loop approach, refined using large language models (LLMs). The MHCLE algorithm performed multi-label classification of 13 SDOMH variables and was evaluated against baseline models, including RoBERTa, Bio_ClinicalBERT, ClinicalBERT, and ClinicalBigBird.
The MHCLE model achieved superior performance with 96.29% accuracy and a 95.41% F1score, surpassing baseline models. Training-testing policies P1, P2, and P3 yielded accuracies of 98.49%, 90.10%, and 89.04%, respectively, highlighting the importance of human intervention in refining LLM annotations.
Integrating the MHCLE model with the HLLIA framework offers an effective approach for predicting SDOMH factors from clinical notes, advancing NLP in OUD care. It highlights the importance of human oversight and sets a benchmark for future research.
We utilized 2636 annotated discharge summaries from the Medical Information Mart for Intensive Care (MIMIC-IV) dataset. High-quality annotations were ensured via a human-in-the-loop approach, refined using large language models (LLMs). The MHCLE algorithm performed multi-label classification of 13 SDOMH variables and was evaluated against baseline models, including RoBERTa, Bio_ClinicalBERT, ClinicalBERT, and ClinicalBigBird.
The MHCLE model achieved superior performance with 96.29% accuracy and a 95.41% F1score, surpassing baseline models. Training-testing policies P1, P2, and P3 yielded accuracies of 98.49%, 90.10%, and 89.04%, respectively, highlighting the importance of human intervention in refining LLM annotations.
Integrating the MHCLE model with the HLLIA framework offers an effective approach for predicting SDOMH factors from clinical notes, advancing NLP in OUD care. It highlights the importance of human oversight and sets a benchmark for future research.