Skip to main content

Main menu

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
    • Continuing Education
    • JNM Podcasts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Journal Claims
    • Corporate & Special Sales
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI

User menu

  • Subscribe
  • My alerts
  • Log in
  • Log out
  • My Cart

Search

  • Advanced search
Journal of Nuclear Medicine
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI
  • Subscribe
  • My alerts
  • Log in
  • Log out
  • My Cart
Journal of Nuclear Medicine

Advanced Search

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
    • Continuing Education
    • JNM Podcasts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Journal Claims
    • Corporate & Special Sales
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • View or Listen to JNM Podcast
  • Visit JNM on Facebook
  • Join JNM on LinkedIn
  • Follow JNM on Twitter
  • Subscribe to our RSS feeds
Meeting ReportPoster - PhysicianPharm

Evaluation of large language models in natural language processing of PET/CT free-text reports

Tyler Bradshaw and Steve Cho
Journal of Nuclear Medicine May 2021, 62 (supplement 1) 1188;
Tyler Bradshaw
1University of Wisconsin Madison WI United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steve Cho
2University of Wisconsin-Madison Madison WI United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Info & Metrics
Loading

Abstract

1188

Objectives: Natural language processing (NLP) has many promising applications in nuclear medicine, including assisted report generation, synoptic reporting, and intelligent information retrieval. Recently, large transformer-based language models such as Bidirectional Encoder Representations from Transformers (BERT) have achieved state-of-the-art results on a number of NLP tasks. These models have not been explored in the domain of nuclear medicine, which has a unique vocabulary and reporting style. The goal of this study is to investigate the performance of language models in nuclear medicine through the task of report classification.

Methods: Different language models were investigated for their ability to correctly classify free-text PET/CT reports. The task was to classify reports into one of five categories based on the lymphoma FDG PET Deauville five-point visual criteria score (DS) contained in the report. PET/CT reports from 2009-2018 containing “Deauville” and “lymphoma” were identified and extracted from the University of Wisconsin-Madison clinical PACS system and anonymized. DS were automatically extracted from reports, and then all mentions of DS were subsequently removed from the reports. The reports’ findings and impression sections were combined and used for model training. A number of NLP methods were evaluated for their impact on classification performance. Two classes of language models were compared: doc2vec and BERT. For doc2vec, the report text was pre-processed using standard cleaning techniques, including stemming and removal of stop words and punctuation. Custom synonym replacement was also performed. For BERT models, only custom synonym replacement was used. Three types of BERT models were investigated: the baseline BERT model, a bioBERT model trained on a corpus of medical literature, and bio-clinicalBERT trained on a corpus of medical literature and clinical/discharge notes. The added value of appending custom nuclear medicine vocabulary (e.g., “SUV”) to the BERT model’s vocabulary was investigated. For all models, a DS classifier was trained on top of the language model, which used the model’s representation vector as input. To determine if models relied on confounding factors to classify reports (e.g., report length), a subset of reports were manually altered by swapping disease-positive sentences with disease-negative sentences and then vice versa, and the impact of sentence swapping on model predictions was evaluated.

Results: A total of 1813 reports were included in the study, with 10% (181) used for validation and 20% (363) used for testing. Of the 3 BERT models, bioBERT had the best overall performance, although differences between models were small. The 5-class accuracy of the doc2vec model was 58% with a weighted kappa of 0.56. The 5-class accuracy of the bioBERT model was 65% with a weighted kappa of 0.64. Adding nuclear medicine vocabulary to the model’s vocabulary had no consistent impact on performance, with gains in accuracy ranging from +10% to -12 %. When categories were collapsed between responding (DS 1-3) and non-responding (DS 4-5) cases, doc2vec and bioBERT had the same classification accuracy of 81%. Sentence swapping changed the model predictions in approximately 40% of the cases, suggesting that models were interpreting report sentiment but also potentially relying on confounding factors.

Conclusions: Language models were able to accurately interpret the sentiment contained in free-text PET/CT reports. The large language model bioBERT outperformed doc2vec in the complex task of classifying reports into 5 classes, but both models performed similarly on the simpler task of binary classification. Future work will explore how confounding factors influence model performance using NLP interpretation methods. Research support: This research was supported by GE Healthcare.

Previous
Back to top

In this issue

Journal of Nuclear Medicine
Vol. 62, Issue supplement 1
May 1, 2021
  • Table of Contents
  • Index by author
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Journal of Nuclear Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Evaluation of large language models in natural language processing of PET/CT free-text reports
(Your Name) has sent you a message from Journal of Nuclear Medicine
(Your Name) thought you would like to see the Journal of Nuclear Medicine web site.
Citation Tools
Evaluation of large language models in natural language processing of PET/CT free-text reports
Tyler Bradshaw, Steve Cho
Journal of Nuclear Medicine May 2021, 62 (supplement 1) 1188;

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Evaluation of large language models in natural language processing of PET/CT free-text reports
Tyler Bradshaw, Steve Cho
Journal of Nuclear Medicine May 2021, 62 (supplement 1) 1188;
Twitter logo Facebook logo LinkedIn logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Bookmark this article

Jump to section

  • Article
  • Info & Metrics

Related Articles

  • No related articles found.
  • Google Scholar

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

Poster - PhysicianPharm

  • Preliminary result of Texture Analysis on prediction of overall outcome of neuroendocrine tumors based on pre-therapy heterogeneity of somatostatin receptors on 68Ga Dotatate PET/CT scans.
  • The Incremental Value of 18F FDG Labelled Leukocytes PET/CT Over 18F FDG PET/CT Scan in the Detection of Occult Infection
  • An empirical update of left ventricular 3D segmentation algorithm in myocardial perfusion SPECT imaging
Show more Poster - PhysicianPharm

PIDS - Data Sciences

  • Global (wholebody) and local (radiomics) PET deep features for outcomes prediction with multi-task multi-scale deep neural network model
  • Automated Liver Lesion Detection in 68Ga DOTATATE PET / CT: Preliminary Results using a Deep Learning 3D Fully Convolutional Network
  • Multiple Machine Learning Algorithms for Overall Survival Modeling of NSCLC Patients Using PET-, CT-, and Fusion-Based Radiomics
Show more PIDS - Data Sciences

Similar Articles

SNMMI

© 2025 SNMMI

Powered by HighWire