Abstract
242447
Introduction: ChatGPT is a multimodal large language model created by OpenAI with both free and for-fee access. While this primarily functions as a chatbot imitating human conversation it also offers the additional feature of prompt engineering to accomplish machine-learning-based tasks. At our institution, most backend informatics processes require in-house software development by a group of highly trained and skilled staff including, physicians, informaticians, programmers, and information technologists. Accomplishing a task such as auto-population of text within a structured PET/CT report can take a significant amount of time and human resources. The purpose of this abstract was to determine the capability of ChatGPT to generate structured PET/CT reports from unstructured data and compare the performance of GPT-4 with GPT-3.5.
Methods: Structured, standardized PET/CT report templates with form fields are utilized at our institution. The empty template for an FDG PET/CT report was used for prompt engineering. The Nuclear Medicine physician read PET/CT reports from 10 random de-identified patients within the past month were used for analysis. The clinical statement, technique, findings, and impression data were removed from the templates to be used for prompts. After a trial and error period of three attempts an optimized prompt was engineered using GPT-3.5 to generate a complete PET/CT report by filling in the structured, standardized PET/CT report template with the de-identified patient scan data. Subsequently, GPT-4 was compared to GPT-3.5 by using increasing levels of disorganization of the de-identified patient data, involving; organized data, clustered data, and unorganized clustered data, to prompt results. Prompt generations from GPT-4 and GPT-3.5 were compared to the original Nuclear Medicine physician report as well as each other for errors.
Results: The prompt engineered to generate the report was, "Please add these scan findings and interpretation, ‘[de-identified patient scan data here]’, to this template: ‘[structured, standardized PET/CT report template here]’, and if a section has no findings then add ‘No abnormal uptake’." Reports were generated by ChatGPT within 1 minute of prompting. GPT-3.5 generated nearly complete reports but had errors at all three levels of data disorganization. Examples included adding abnormal findings to the wrong section, adding "No abnormal uptake." to sections with abnormal findings, copying human punctuation, "image 1:30" instead of "image 130", and dictation, "produce and meter" instead of "cm" errors, from the original de-identified patient scan data, and changing the formatting of punctuation (Figure 1). GPT-4 successfully generated complete reports without errors at any of the three levels of data disorganization and only had instances of changing the formatting of punctuation. This was able to be resolved by engineering the prompt to include "do not change formatting". In addition to a lack of errors, GPT-4 corrected the human punctuation, "1:30" was changed to "130", and dictation, "produce and meter" was changed to "cm", errors (Figure 2).
Conclusions: ChatGPT is capable of generating a PET/CT report using a structured, standardized report template and varying levels of organized to disorganized patient scan data. GPT-4 outperforms GPT-3.5, which was found to make mistakes not compatible with clinical implementation. GPT-4 executes with velocity, veracity, and adds benefit to physician-read reports by correcting human-made grammatical mistakes.