Abstract
241799
Introduction: Denoising diffusion probabilistic model (DDPM) has made a great breakthrough in computer vision and has been shown to be effective in PET image denoising. However, despite the high generation quality of DDPM, it still needs to improve on two problems: long sampling time (especially for long-axial PET images) and the accuracy of the generated image. To address these two issues, we proposed a consistent denoising diffusion model (CDDM) in this work to achieve fast and accurate PET image denoising.
Methods: The training and sampling processes of the proposed model are shown in Fig. 1. In our model, the forward diffusion process continuously adds noise to the full-dose image. The full-dose image does not decay like DDPM and remains constant. The noise-added images were randomly input to the network with time step t. The 1/50-dose PET image was input in an additional channel as the conditional prior. The network was trained to map instead of the random noise to keep the model more stable. The sampling process started with the noise-added images based on an estimated . Our consistent denoising diffusion model was trained in Pytorch 1.9.1 Platform using 1 NVIDIA GeForce RTX 4090 GPU. Due to memory constraints, the training patch was 2.5D with a size of 3×144×256. The batch size is 10. In this work, we used the output of a 3D Unet (trained by mapping 1/50-dose to full-dose PET images) plus a slight noise to simulate . Thus, 3D information can be transferred to our model. The total sampling time step is 5. The network structure of CDDM is the same as DDPM. The study contained 50 11F-FDG datasets obtained from the Biograph Vision Quadra PET/CT scanner (Siemens Healthineers) with a scanning time of 6 min. The mean administered dose is 221.7±48.0 MBq. The corresponding 1/50-dose PET images were generated by rebinning the list-mode data. In this study, 25 datasets were used for training, 5 for validation, and 20 for testing.
Results: Figure 2 (a) shows the coronal view of the denoised image for one test dataset. CDDM-1 is the result of one sample and CDDM-10 is the average of 10 samples. Each sample runs 5 time steps while sampling. We can see that the result of 3D Unet is smoother, while diffusion model based methods can generate images having a similar texture as the full-dose image. Due to the absence of accurate constraints, some important lesions (yellow arrow) are missing in the results generated by DDPM. Our model used the result of the 3D Unet to simulate . After sampling, our model can generate denoised PET images with more accurate lesion structures than the original 3D Unet's result (pointed by red arrow), and some biases were corrected (pointed by white arrow). The average result of ten samples is very similar to that of one sample, proving that our method has strong constraints on the structure of the generated image. Figure 2(b) displays the box plot of the PSNRs and SSIMs using different methods. CDDM-10 (PSNR, 28.25±3.20; SSIM, 0.9159±0.0158) and CDDM-1 (PSNR, 28.20±3.19; SSIM, 0.9115±0.0162) have higher mean (±SD) quantitative scores than 3D Unet (PSNR, 27.52±3.13; SSIM, 0.9056±0.0184) and DDPM (PSNR, 28.06±2.83; SSIM, 0.8992±0.0188). Figure 2(c) shows the variance map of 10 samples for the same patient in Figure 2(a). We can see higher variance in the lesion region and brain. For the sampling time, CDDM needs only 5 time steps with 1.41s, while DDPM runs 1000 time steps using 240s for each patient.
Conclusions: In this work, we proposed a consistent denoising diffusion model for PET image denoising. The proposed model can generate the denoised PET image with better structure accuracy than 3D Unet and DDPM. Meanwhile, our model only takes 1.41s for the sampling of one patient. Our future work will focus on more clinical evaluations.