Abstract
322
Objectives: PET segmentation is an active field of research and tumor heterogeneity remains a challenging task in this setting. The emerging concept of deep learning together with higher hardware performance levels have recently promoted the use of artificial neural networks in the medical field. In this context, convolutional neural networks (CNN) are particularly adapted to pattern recognitions, including image-based classifications and segmentations. If several CT or MRI studies provided recent promising results, the use of CNN in PET imaging specifically, remains widely under-evaluated. In this study, we investigated the performance of a three-dimensional convolutional neural network (3D-CNN) to segment heterogeneous 18F-FDG PET lung tumors, compared to an optimized expert-based reference standard. Methods: Seventy-six 18F-FDG PET lung tumors with various degrees of heterogeneity were retrospectively included. For each PET tumor, a probabilistic estimate of the ground truth was computed from the set of six expert-based manual segmentation results using the Simultaneous Truth and Performance Level Estimate algorithm (STAPLE). The 76 PET samples (inputs) and their corresponding STAPLE ground truth segmentation pairs (outputs) constituted the full dataset of the 3D-CNN procedure. All the PET samples were centered and scaled. The dataset was then randomly partitioned into training, test and validation sets (50, 6 and 20 PET samples respectively). A modified version of the widely used “3D-Unet” CNN was built, using the difference between the non-weighted binary cross entropy loss and a modified Dice Similarity Coefficient (DSC) as the loss function. Activation function was a scaled exponential linear unit to give self-normalizing property to the network. For the training phase of the 3D-CNN, the following parameters were used: number of epochs = 750, batch size = 1, without batch normalization, threshold for predictions = 0.5 to produce binary segmentations. To prevent overfitting, augmentation techniques including random shift, crops and rotations combined with elastic deformations were used. Moreover, a small learning rate of 0.0001 was used, with a scheduled decrease factor of 10 after 500 and 600 epochs. The testing set was only used as feedback to monitor the learning process and detect possible overfitting. For the validation phase, 4 different metrics were used to evaluate the performance of the 3D-CNN segmentation procedure: the Dice similarity coefficient (DSC), the mean absolute SUVmean and SUVmax errors, and the mean absolute relative volume error. Reported 95% confidence intervals were computed using a non-parametric bootstrap procedure with 3000 replications for each metric. Results: After 750 epochs, the training and testing loss functions were stable at -0.91 (DSC = 0.93), and the corresponding learning curves did not exhibit signs of overfitting. The fully trained 3D-CNN applied to the validation set provided the following performance metrics: the mean DSC was 0.911 (95%CI = 0.887 - 0.925); the mean absolute SUVmax error was null for all the 20 validation samples; the mean absolute SUVmean error was 0.26 (95%CI = 0.20 - 0.37); the mean absolute relative volume error was 10.9 % (95%CI = 8.1 - 14.2 %). Conclusion: Compared to the optimized expert-based ground truth, our 3D-CNN provided excellent performance metrics to segment heterogeneous lung tumors, despite a limited training set. A fully-trained 3D-CNN is automated and does not require complex pre-processing imaging workflow. Given more diversified samples to train on, performance of our 3D-CNN would be further increased. Larger multicenter studies are warranted to validate such 3D-CNN approach as a potential new reference standard in this setting.