Leveraging Large Language Models for Thyroid Nodule Information Extraction and Matching Across Medical Reports.

Accurate extraction of thyroid nodule features from radiology and pathology reports is clinically essential for guiding patient management decisions, such as surgical intervention or active surveillance. However, manual data extraction from electronic health records is labor-intensive and prone to inter-rater variability. To address this challenge, we evaluated open-source large language models (LLMs) for automating the extraction and matching of these critical nodule features. Using a retrospective dataset of 451 ultrasound and pathology report pairs, we developed an annotation schema capturing nodule characteristics. Two LLMs-Llama-3.3 70B and QwQ-32B-were benchmarked against manual annotations. Both models demonstrated near-perfect extraction accuracy for clinically relevant features such as location, size, and biopsy results. Notably, QwQ-32B achieved an F1 score of 0.987 on the complex multi-step reasoning task of matching nodules across reports. Our findings suggest integrating LLMs into clinical annotation workflows can significantly reduce clinician workload and inter-rater variability while maintaining high accuracy.
Cancer
Access
Care/Management
Advocacy

Authors

Lee Lee, Amara Amara, Beon Beon, Swee Swee, Radhachandran Radhachandran, Athreya Athreya, Ivezic Ivezic, Arnold Arnold, Speier Speier
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard