Abstract
109
Objectives: Deep-learning methods have been applied in PET image denoising. However, when the noise level in the training and testing images are different, a deep-learning model could introduce spatial blurring and affect denoising performance. In this study, the impact of the unmatched noise level in training and testing images for denoising low dose PET images was investigated and potential solutions are proposed.
Methods: 168 patients 18F-FDG images acquired on a Siemens mCT PET/CT were included in this study. Low-count PET images were generated by rebinning the list-mode data at 20% (2 samples), 40% (2 samples), and 60% (1 sample) by uniform down-sampling. Among all patients, 100 studies were randomly selected as the training datasets, and the rest as testing data. 500 training images (100 patients x 5 count samples) were sorted into five noise level groups, which were surrogated by the image-based measurement of normalized standard deviation (NSTD) within a 3-cm diameter 3D spherical ROI within the liver. 340 testing images (68 patients x 5 count samples) were sorted into 5 groups with the same NSTD range in the training groups. The NSTD ranges in each group are 0-0.12, 0.12-0.14, 0.14-0.17, 0.17-0.20, 0.20-0.50. Models trained on each group were named with the corresponding group number, e.g. Model 1 was trained on Group 1 images. Model/Group 1 corresponded to the lowest noise group while Model/Group 5 corresponded to the highest noise group. We also applied a control group covering the full range of noise level with NSTD of 0-0.5, and the model trained on this group was named as ModelAll. 3D U-Net was used in this study. The total numbers of images in each group are the same with 100 noisy images as input and the corresponding full-dose images as the label. Mean squared error (MSE), peak signal to noise ratio (PSNR), and structural similarity (SSIM) as compared to full-dose images were used for evaluation.
Results: For each group, different U-Net models resulted in denoised images with different tradeoffs between noise reduction and resolution. Trained on less noisy images and tested on noisier images, the model showed limited denoising performance but better preserved resolution. Trained on noisier images and tested on less noisy images, the model achieved greater noise reduction but introduced substantial blurring. Models 2, 3, and 4 led to the highest SSIM for Groups 2, 3, and 4 testing images, respectively, while such a trend is not clear for MSE and PSNR. Among all the evaluation parameters, ModelAll did not outperform other individual models on all the testing groups.
Conclusions: Unmatched training and testing datasets resulted in different trade-offs between denoising and preserving resolution. Our results suggested that U-Net models trained on images with noise level ranges that match with those of testing images are generally desirable, though different imaging tasks and evaluation metrics could require different optimized models. A one-size-fits-all model trained with all noise levels is not recommended. A personalized strategy is needed for deep learning-based PET denoising.