Abstract
659
Purpose: Although texture features have shown potential of being quantitative biomarkers for assessing intratumoral heterogeneity, one needs to be confident that they can detect meaningful changes reflecting clinical response. The objective was to introduce and evaluate response-to-repeatability (R/R) of PET texture features as a measure to assess the range of detectable, significant feature changes.
Methods: Response-to-repeatability (R/R) of texture features was assessed in individual lesions in metastatic castration-resistant prostate cancer patients. Lesions were segmented and matched across three scans: two baseline and one on-treatment follow-up 18F-NaF PET/CT using Quantitative Total Bone Imaging (QTBI) software. For each lesion, 47 PET-based texture features representing 5 matrix groups were measured across voxel patches: 6 histogram-based first-order metrics, 22 second-order metrics from grey-level co-occurrence matrix, 11 grey-level run-length features, 5 neighboring grey-level dependence features, and 3 metrics from neighborhood grey-tone difference matrices. Test-retest repeatability was assessed with coefficient of variation (CV), intra-class correlation coefficient (ICC), and 95% limits of agreement (LOA). R/R of each feature was evaluated as the proportion of changes from baseline to follow-up that were beyond repeatability margins (i.e., outside of LOA). Results A total of 265 NaF-avid bone lesions were identified in 18 patients who received double baseline scans. R/R varied within and across matrix groups: 41/47 (87%) features demonstrated R/R > 5%; 21/47 (45%) features demonstrated R/R > 10%, and 11/47 (23%) features demonstrated R/R > 20%. Magnitude of R/R at follow-up did not correlate with its magnitude of repeatability, which also varied greatly across features: 26/47 (55%) features demonstrated low variability (CV < 10%) and 39/47 (83%) features demonstrated medium variability (CV < 20%). 95% LOA of test-retest measurements ranged across texture features, from the narrowest LOA of [0.998, 1.001] to the widest LOA of [0.22, 4.86]. 42/48 (88%) texture features demonstrated high ICC (ICC > 0.75). 39/47 (83%) features demonstrated both high ICC and significant R/R (R/R > 5%). Conclusion Although texture features may demonstrate high test-retest repeatability, a commonly evaluated index of quantitative biomarkers, features may not effectively detect meaningful change at follow-up (i.e. low R/R). We present a method to evaluate response-to-repeatability as criteria for identifying quantitative imaging biomarkers of response for use in treatment assessment.