Abstract
242504
Introduction: Delineating tumors in PET/CT images involves manual segmentation, which is both time-consuming and error-prone due to variability between observers. To address this, we trained a 3D UNet model with cropped cubic patches from PET/CT images with voxel-level annotations. The model is assessed on the test set using the Dice similarity coefficient (DSC) for segmentation accuracy and three criteria for lesion-level detection. Additionally, lesion measures such as SUVmean, SUVmax, TMTV, and TLG are computed from the predicted lesion masks and compared to measures from physician's masks using paired t-tests. This work aims to provide an automated and reliable method for lesion segmentation and measure computation in PET/CT images.
Methods: In this study, PSMA PET scans from 380 prostate cancer patients were manually segmented by nuclear medicine physicians, yielding a dataset with lesion counts ranging from 1 to 5 per scan. The data was partitioned into training (n=258), testing (n=57), and validation sets (n=65). Data augmentation included randomized transforms exclusively for training and non-randomized transforms for all sets. Non-randomized transforms involved normalizing CT intensities, cropping regions outside the body, and resampling to an isotropic voxel spacing of (2.0 mm, 2.0 mm, 2.0 mm). Randomized transforms included the random cropping of cubic patches (128x128x128) around lesions or background voxels with 80% and 20% probabilities, respectively.
A 3D UNet model was trained with concatenated PET and CT patches as inputs, minimizing the DiceLoss on the training set and maximizing the Dice Score on the validation set. The best-performing model on the validation set was selected and evaluated on the test set. Model performance on the test set was assessed using patient-level foreground DSC and lesion-wise sensitivity based on three detection criteria (C1, C2, and C3). C1 aimed to detect at least one voxel of the lesion, C2 required an intersection over union >50%, and C3 focused on detecting the voxel containing the SUVmax. Paired t-tests were employed to compare ground truth and predicted lesion measures (SUVmean, SUVmax, TMTV, and TLG) at a significance level α=0.05, with the null hypothesis indicating equal means. To address multiple testing (four metrics), the α value was adjusted using Bonferroni correction to α_corrected = 1.25 × 10-2.
Results: The fully automated deep-learning method proposed in this study yielded mean and median Dice scores of 0.39 and 0.41, respectively, on the test set. Fig 1 presents the model's sensitivity in detecting lesions based on three detection criteria. Specifically, the results show a median sensitivity of 1 for criterion C1, indicating the model's ability in identifying a substantial number of lesions across the entire dataset. Overall, the detection and segmentation results appear state-of-the-art network performances in the literature for this difficult problem. Additionally, Fig 2 highlights the model's performance in detecting lesions of varying volumes. For lesions with smaller volumes (TMTV<1 cm³), the model successfully detected 26 out of 40 lesions, resulting in a detection ratio of 0.65 within this volume range. Furthermore, we present an evaluation of the reproducibility of SUVmean, SUVmax, TMTV, and TLG metrics. At the corrected α level, the p-values for SUVmean, SUVmax and TLG indicate that the observed results do not align with the alternate hypothesis, suggesting a lack of evidence to reject the null hypothesis of equal sample means of these metrics across the distribution of predicted and actual values (Fig 3). A summary of the model performance on the segmentation metrics is shown in Fig 4.
Conclusions: We developed a segmentation method based on deep learning for segmenting 3D lesion volumes on whole body PSMA PET images demonstrating substantial potential in automating the delineation and quantification of prostate cancer lesions.