Abstract
241718
Introduction: Conventional PET image reconstruction techniques (FBP, OSEM, MLEM) suffer from data/model mismatches, data inconsistency, and data over-fitting which lead to artifacts and noise in the resultant images. Recently, deep learning-based PET reconstruction techniques that directly transform PET raw (sinogram) data into images (such as DeepPET) have demonstrated better performance. However, the use of mean square error-based supervision in these techniques usually yield solutions with overly smooth textures and have poor perceptual quality. Image-conditioned denoising diffusion probabilistic generative models (cDDPM) are another class of deep learning techniques that can generate photorealistic results. These models however suffer from insufficient correspondence and consistency when the conditioning input and the reconstructed output images are defined in two different domains (sinogram conditional input and PET image output, Sino-DDPM). To address this limitation with cDDPM, we propose a new robust pipeline (R2U-DDPM) that utilizes a pre-trained attention R2U-Net as an auxiliary prior to inform cDDPM about the coarse structure of reconstructed PET images to increase model reliability and generate realistic results from sinogram inputs
Methods: Our framework uses a learned attention R2U-Net as guidance to cDDPM in order to improve the performance of Sino-DDPM. The attention R2U-Net generates a coarse PET image through a deterministic process given a sinogram as input, and the cDDPM takes it as the condition for the final PET image prediction (Fig. 1). The learned attention R2U-Net architecture (Fig. 2) was built on the recurrent U-Net architecture with two modifications: the addition of multiple designed self-attention blocks and the addition of residual convolutional operations. Specifically, these enhancements aid in the refinement of the input intermediate feature maps. We also included a skip connection between the input feature map and the refined feature map to further optimize this process. The attention R2U-Net was supervised by mean square error loss. For data preparation, we simulated 2D 18 F-FDG PET images using 20 3D brain phantoms from BrainWeb with the resolution and matrix size of 2.086×2.086×2.031 mm and 344×344×127 acquired from the Siemens mMR scanner. For each phantom, we selected 85 non-continuous slices and generated high count sinograms. Using OSEM algorithm with point spread function, we generated a total of 7130 sinogram/PET image pairs and randomly split them into 6620/255/255 as training/validation/testing sets respectively. All models were implemented on the PyTorch platform and executed on NVIDIA A100 GPUs for 100 epochs. We compared the performance of our R2U-DDPM with OSEM reference and contrasted the results with those from DeepPET and Sino-DDPM. The comparison was based on visual inspection and distortion metrics PSNR (dB)/SSIM.
Results: Visual inspection of the reconstructed images from the various algorithms show that the R2U-DDPM produces the best images with the least amount of blur compared to OSEM reference (Fig. 3). The average PSNR (dB)/SSIM of the testing dataset for DeepPET, Sino-DDPM and R2U-DDPM were 28.41/0.949, 27.55/0.893, 28.48/0.955 respectively.
Conclusions: Our results show that R2U-DDPM algorithm has the potential to produce images that are more qualitatively and quantitatively closed to OSEM reference compared to other sinogram-to-PET image deep learning techniques. Further validation of this approach on 3D patient data from hospital is forthcoming.
References:
[1] Häggström, Ida, C. Ross Schmidtlein, Gabriele Campanella, and Thomas J. Fuchs. "DeepPET: A deep encoder–decoder network for directly solving the PET image reconstruction inverse problem." Medical image analysis 54 (2019): 253-262.
[2] Ho, Jonathan, Ajay Jain, and Pieter Abbeel. "Denoising diffusion probabilistic models." Advances in neural information processing systems 33 (2020): 6840-6851.