Abstract
252230
Introduction: Objective evaluation of imaging methods on clinical tasks typically requires access to a known ground truth. For evaluation of imaging methods in myocardial perfusion imaging (MPI), one approach to knowing ground truth is the insertion of synthetic defects, whose prevalence and properties, such as their size, location and severity are known exactly. The clinical relevance of these studies is strengthened when the prevalence and the distribution of synthetic defects properties match those observed in real-world patient populations. However, obtaining these statistics from image data can be challenging. Clinical reports of patients typically mention the defect properties, and thus analysis of these reports over large datasets may provide a mechanism for deriving the population statistics for these defects. However, analyzing such large datasets manually is very time-consuming and labor-intensive. To address this issue, we developed a large language model (LLM)-based approach and demonstrated its application in extracting the prevalence and properties of myocardial perfusion defects from MPI SPECT clinical reports.
Methods: This IRB-approved retrospective study was performed on a dataset of N=1000 patients who had undergone MPI SPECT, with clinical reports obtained between September 2018 and June 2024. These reports contained impressions and findings, including defect presence, defect properties such as location, size, severity, and reversibility, as well as patient diagnosis, as reviewed by board-certified nuclear-medicine or nuclear- cardiology physicians. Using prompt-engineering techniques, a HIPAA-compatible LLM, namely WashU-ChatGPT, was trained with detailed instructions to yield the defect presence and defect properties for each patient from the provided clinical report. We developed an automated and streamlined process to input the clinical reports into the model and store the model outputs. We first validated the performance of the approach on this task with a test set of N=100 patients by comparing it against a manual analysis using accuracy as a figure of merit. We also quantified the efficiency of the proposed method as compared to manual analysis in terms of time savings. After processing 1000 patients, the defect prevalence and distribution of defect properties were generated and were then reviewed by expert physicians.
Results: The proposed approach yielded accuracies higher than 96% across all assessed categories: defect presence, location, size, severity, reversibility and diagnosis (Figure 1). Further, the method was 3.1 times faster than manual analysis, requiring only 12 seconds per patient and more notably, with minimal manual labor. The defect prevalence and distribution generated from our analysis are shown in Figure 2. Our results show a substantial non-uniformity in the distribution of these defect properties. For example, in terms of defect severity, it was observed that the percentage of mild, moderate, and severe defects were 34%, 29% and 17% respectively.
Conclusions: Our findings demonstrate the feasibility, efficiency and accuracy of an LLM-based approach for determining the distribution of myocardial perfusion defect properties. Our results highlight the need for deriving these population distributions to strengthen the clinical relevance of evaluation studies that use synthetic defects and motivate further use of the approach on larger datasets, as well as developing similar LLM-based approaches for determining population-based statistics for other applications.