Abstract
1428
Objectives For accurate analysis of treatment response in patients with numerous bone lesions, benign lesions should be excluded from the analysis. This study aims to assess different metrics that can be extracted from consecutive [F-18]NaF PET/CT images and determine which metrics are useful for differentiation of benign and malignant bone lesions.
Methods 13 patients with advanced metastatic prostate cancer received pre-treatment and follow-up [F-18]NaF PET/CT scans under the course of chemotherapy or AR directed therapy. Three nuclear medicine physicians identified all lesions in the patients on the baseline images and by consensus determined which lesions were definitely benign and definitely malignant. A variable SUV threshold, based on background NaF uptake in different parts of the skeleton (skull, spine, pelvis, etc), was used for lesion segmentation on both baseline and follow-up images to allow for volumetric feature extraction. Articulated registration based matching was used to propagate lesion contours from baseline to follow-up scans. For each segmented lesion contour 3 standardized uptake value (SUV) metrics, 3 Hounsfield unit (HU) metrics, and lesion volume were extracted. The extracted metrics were maximum (max), mean, and standard deviation (hetero). Percent response of each metric to follow-up was also assessed. The distributions of these metrics for definite benign and definite malignant lesions were compared using t-tests. Using physician consensus as a gold standard, logistic regression analysis was performed using only baseline metrics and using response as well as baseline metrics were compared.
Results 350 malignant lesions and 45 benign lesions were identified by nuclear medicine physician consensus. Of these lesions, 25 malignant and 7 benign lesions were not detected at the follow-up scan. Linear regression models using baseline SUV metrics (AUCs of 0.76, 0.76, 0.74 for SUVmax, SUV¬mean, and SUVtotal, respectively) were better models than HU metrics (AUCs of 0.59, 0.61, 0.61 for HUmax, HU¬mean, and HUtotal, respectively). Volume was the worst baseline model (AUC=0.53). Adding the response values of a metric in every case improved the model by a small to moderate amount (AUC increases by between 0.01 to 0.7, p values from 0.007 to 1). Using all response metrics in addition to baseline metrics in a single model was found to only be slightly superior to a model using all the baseline metrics (AUC=0.82 vs AUC=0.79, respectively). However, ANOVA testing showed that the differences were not significant (p>0.05).
Conclusions Addition of response in addition to baseline values of metrics helps improve classification results coming from linear regression models. Additional metrics incorporated into more advanced classification algorithms could further improve the classification results.