Language Model of Lung Nodules in LNDb Medical Reports .

The description of lung nodules in medical reports may play a critical role in clinical decision-making and longitudinal analysis. However, the unstructured nature of many medical reports poses challenges for accessing, analyzing, and reusing this information. To address this, we propose a method to analyze medical reports in Portuguese, derived from the LNDb dataset. A multi-step approach-comprising sentence relevance classification, named entity recognition, and relation extraction-was implemented. The goal was to identify and organize key information related to lung nodules, such as location, size, and characteristics, to enable its use for statistical metrics or to facilitate reannotation of imaging data. The different steps of the approach apply transformer-based models, including BioBERTpt and BERTimbau. The best performance was achieved using BERTimbau-large, with an F1 score of 0.87 for named entity recognition and an accuracy of 0.69 for relation extraction. Although the relation extraction step proved particularly challenging, the results demonstrate the potential of this method to improve the efficiency and accuracy of nodule analysis. The adoption of automatic tools like this in clinical practice is an inevitable step forward, offering significant time savings and improved accuracy in treatment.
Cancer
Chronic respiratory disease
Care/Management

Authors

Ferreira Ferreira, Ferreira Ferreira, Guimaraes Guimaraes, Coimbra Coimbra, Campilho Campilho, Jorge Jorge
View on Pubmed
Share
Facebook
X (Twitter)
Bluesky
Linkedin
Copy to clipboard