Abstract
575
Objectives: Artificial intelligence (AI)-based methods are showing promise in reconstructing and processing nuclear-medicine images[1,2] Conventionally these methods are evaluated using figures of merit (FoMs) such as root mean squared error (RMSE)[3,4], structural similarity index (SSIM)[3,5], and peak signal-to-noise ratio (PSNR)[3,6]. However, nuclear-medicine images are acquired for clinical tasks, such as detection and quantification, and it is unclear if these metrics correlate with task performance. Our objective was to study whether evaluation of the AI-based methods using conventional FoMs yielded the same interpretation as objective evaluation on clinical tasks.
Methods: There is much interest in using AI-based methods to acquire images at lower dose/acquisition times. Given this interest, we conducted this study in the context of evaluating an AI-based method to denoise myocardial perfusion SPECT (MPS) images acquired at five-times lower dose (Fig. 1a). Using the XCAT phantom[7], a total of 14,000 digital phantoms with and without defects were generated. 18 types of myocardial defects, including 3 extent types, 3 severity types, and 2 locations, all based on existing clinical data[8,9], were evenly distributed in the defect-present population. Projection data for these phantoms at both normal and five-times lower dose were generated using highly realistic simulations that modeled various SPECT image-degrading processes. Both these sets of images were reconstructed using a 2D OSEM-based technique. A convolutional neural network (CNN)-based method was developed to predict the normal-dose image from the low-dose image (i.e. denoise the low-dose image). The CNN was optimized and trained using 12,000 low and normal-dose image pairs, and the rest 2000 were used for testing. The performance of this AI-based method on the clinical task of detecting myocardial perfusion defects was objectively evaluated using a previously validated Hotelling-observer-based technique designed for this task[10]. The detection performance was quantified using area under the ROC curve (AUC). Also, the RMSE, SSIM, and PSNR were computed for the CNN-denoised images. For comparison, we also computed these FoMs for the images reconstructed using the pure OSEM technique.
Results: The AI-based method significantly outperformed the pure OSEM method (p value < 0.001) yielding higher SSIM and PSNR values and lower RMSE values (Table 1). For e.g. the SSIM with the AI and OSEM-based methods were 0.869 (95 % CI: 0.868, 0.871) and 0.851 (95 % CI: 0.850, 0.853), respectively. Further, visually, the images look less noisy (Fig. 1c). However, in the observer study, the AI and OSEM-based methods yielded AUC values of 0.73 (95 % CI: 0.71,0.76) and 0.74 (95 % CI: 0.72, 0.77), respectively (no statistical difference, in fact mean AUC slightly lower with AI-based approach) with almost overlapping ROC curves (Fig. 1b). Thus, the results from the observer study were in conflict with evaluation using the conventional metrics.
Conclusions: The results motivate the need for objective evaluation of AI-based methods for nuclear-medicine imaging on clinically relevant tasks[11] as evaluation with conventional metrics may not capture that aspect of the performance. Further, the results motivate optimization of AI-based approaches on clinically relevant tasks.