Abstract
245
Background: Positron emission tomography (PET) is an inherently noisy process. This becomes even more problematic in very low-count situations, e.g. low radiotracer dose, short scan time, or quick dynamic framing, and reconstructions of the sparse PET data may produce images of poor quality with limited use. This work presents an experiment designed to evaluate the use of a convolution neural network (CNN) for improving PET images reconstructed from low-count data.
Methods: A 3D convolution neural network (CNN) was developed and trained with real patient data. The CNN consisted of layers organized in the “U” arrangement, i.e. with symmetric contracting and expanding paths, comprising groups of convolution-pooling layers on the contracting end and groups of convolution-deconvolution layers on the expanding end. There was no activation function used on the output layer in order to produce an output image with a continuous range of values. The network training data were PET images acquired from patients with non-small cell lung cancer, divided into 64 x 64 x 64 volume patches with voxel dimensions 2.0863 x 2.0863 x 2.0313 mm. Nine patients were administered 225.3±5.6 (214.6-233.1) MBq of FDG and scanned with 1 or 2 bed positions over the torso, for 10 minutes per bed, resulting in 132.7±57.6 million true counts (prompts minus randoms) per bed. These PET data were used to emulate lower count levels through random listmode decimation according to 9 predefined levels: 20, 15, 10, 7.5, 5, 2, 1, 0.5 and 0.25 million trues - independent realizations were generated at each count level. The emulated images, along with their corresponding “ground truth” full-count images, made up the paired training set. Three training approaches were investigated: using the images from all of the emulated true count levels and then using only those from the 1 million and 10 million trues sets. For each approach, the trained CNNs were tested on “unseen” samples to assess the qualitative and/or quantitative impact on the images at the various count levels.
Results: For each of the 3 training approaches, the CNNs were determined to reach acceptable convergence after 4000 epochs, as indicated by the network energy function and performance in the validation set. The networks were able to learn and, to some degree, reproduce the latent distributions from which the noisy data originated - this image “enhancement” manifested not as a smoothing effect, but as local image regularization. As expected, the CNN trained with the noisy, 1 million trues dataset learned to impose a higher degree of regularization, which resulted in a loss of detail in the higher count data. The CNN trained with the 10 million trues dataset preserved more detail and contrast recovery but produced slightly noisier output with low-count data. Performance of the CNN trained with all the data fell between the two extremes.
Conclusions: In terms of image quality, the CNN-enhanced images were superior to the original low-count images, and this was true for all training methods. Image pixel variance was significantly reduced and anatomical structures were more clearly visible. However, this had the effect of suppressing high spatial frequencies in the output images and quantification accuracy suffered, particularly for small regions of high-contrast, focal uptake - this effect was especially remarkable for PET data comprising less than 1 million true counts. In general, best results were obtained for improving the test images, when the CNN was trained on data with similar noise levels.