Artificial Intelligence for Opioid Safety Surveillance from Clinical Text: A Clinically Focused Review.
Opioid-related iatrogenic harms, including opioid use disorder, overdose, and opioid-induced respiratory depression, constitute a major patient safety challenge. Although clinicians document key safety signals in unstructured clinical narratives, many of these indicators are not readily captured by conventional surveillance approaches that rely on structured administrative data. This clinically focused narrative review synthesizes 47 empirical studies published between 2009 and 2025 that applied artificial intelligence (AI) methods to identify opioid-related harms from clinical text and to address the resulting ascertainment gap. Across studies, administrative coding systems, including ICD-10, often under-ascertain opioid-related events, whereas text-based AI can identify additional cases and contextual details often documented primarily in narrative records, such as fluctuating mental status, suspected drug causality, and responses to naloxone. Methodologically, the literature has progressed from interpretable rule-based lexicons to machine learning and deep learning models and, more recently, to transformer-based approaches, including large language models (LLMs) for classification and schema-driven extraction. Rule-based systems established the feasibility of transparent surveillance and frequently recovered clinically documented cases missed by billing codes. Subsequent supervised and deep learning approaches expanded scalability and, in a smaller subset of studies, were integrated into electronic health record workflows with operational metrics reported. More recent transformer- and LLM-based studies emphasize richer extraction schemas and benchmark development, including characterization of overdose context and intentionality and identification of potential prodromal neurocognitive signals, although external validation, calibration, and prospective outcome evaluation remain inconsistently reported. Given that the evidence base is predominantly retrospective and that clinical workflow studies remain comparatively few, a pragmatic near-term clinical role is to provide detection-to-triage decision support rather than autonomous diagnosis, in which systems surface candidate cases with reviewable evidence for clinician adjudication. Future progress will require greater standardization of phenotype definitions, routine equity auditing and subgroup reporting, stronger external validation and calibration at operational thresholds, and a shift from retrospective discrimination metrics toward prospective assessments of the clinical workflow impact, clinical utility, and patient-centered outcomes.