Abstract
18F-fluorodihydrotestosterone (18F-FDHT) is a radiolabeled analog of the androgen receptor’s primary ligand that is currently being credentialed as a biomarker for prognosis, response, and pharmacodynamic effects of new therapeutics. As part of the biomarker qualification process, we prospectively assessed its reproducibility and repeatability in men with metastatic castration-resistant prostate cancer. Methods: We conducted a prospective multiinstitutional study of metastatic castration-resistant prostate cancer patients undergoing 2 (test/retest) 18F-FDHT PET/CT scans on 2 consecutive days. Two independent readers evaluated all examinations and recorded SUVs, androgen receptor–positive tumor volumes, and total lesion uptake for the most avid lesion detected in each of 32 predefined anatomic regions. The relative absolute difference and reproducibility coefficient (RC) of each metric were calculated between the test and retest scans. Linear regression analyses, intraclass correlation coefficients (ICCs), and Bland–Altman plots were used to evaluate repeatability of 18F-FDHT metrics. The coefficient of variation and ICC were used to assess interobserver reproducibility. Results: Twenty-seven patients with 140 18F-FDHT–avid regions were included. The best repeatability among 18F-FDHT uptake metrics was found for SUV metrics (SUVmax, SUVmean, and SUVpeak), with no significant differences in repeatability among them. Correlations between the test and retest scans were strong for all SUV metrics (R2 ≥ 0.92; ICC ≥ 0.97). The RCs of the SUV metrics ranged from 21.3% (SUVpeak) to 24.6% (SUVmax). The test and retest androgen receptor–positive tumor volumes and TLU, respectively, were highly correlated (R2 and ICC ≥ 0.97), although variability was significantly higher than that for SUV (RCs > 46.4%). The prostate-specific antigen levels, Gleason score, weight, and age did not affect repeatability, nor did total injected activity, uptake measurement time, or differences in uptake time between the 2 scans. Including the most avid lesion per patient, the 5 most avid lesions per patient, only lesions 4.2 mL or more, only lesions with an SUV of 4 g/mL or more, or normalizing of SUV to area under the parent plasma activity concentration–time curve did not significantly affect repeatability. All metrics showed high interobserver reproducibility (ICC > 0.98; coefficient of variation < 0.2%–10.8%). Conclusion: Uptake metrics derived from 18F-FDHT PET/CT show high repeatability and interobserver reproducibility.
Prostate cancer is driven by the androgen receptor (AR) signaling axis, including the terminal phase of the disease, metastatic castration-resistant prostate cancer (mCRPC). This AR addiction is the basis of numerous AR-targeted therapies for mCRPC that prolong survival and improve quality of life (1,2).
Given the central role the AR axis has in mCRPC and its treatment, there is a pressing need to credential noninvasive biomarkers capable of monitoring the pharmacologic targeting and effect of these drugs. 18F-fluorodihydrotestosterone (18F-FDHT) is a radiolabeled analog of dihydrotestosterone, the primary ligand of the AR, which offers an innovative way of directly imaging the primary molecular engine of castration-resistant prostate cancer with PET/CT. Preliminary studies using 18F-FDHT PET/CT in patients with castration-resistant prostate cancer have demonstrated safety, feasibility, favorable pharmacokinetic properties, accuracy at identifying tumor localizations, and associations with survival (3–7). Furthermore, 18F-FDHT was instrumental for demonstrating AR targeting in the early-phase clinical trials of enzalutamide and apalutamide, 2 AR-directed therapies that have demonstrated substantial clinical activity in mCRPC (8,9).
This international collaboration was undertaken to assess the repeatability and reproducibility of 18F-FDHT uptake measures, a crucial component of biomarker development (10,11). Repeatability is defined as the measurement precision under a set of repeatability conditions (e.g., repeated scans within 1 subject) and reproducibility as the measurement precision under a set of different conditions in similar subjects (e.g., different locations, operators, readers) (12,13).
The aim of this study was to prospectively assess repeatability and reproducibility of whole-body 18F-FDHT uptake metrics of mCRPC metastases.
MATERIALS AND METHODS
Patients were recruited prospectively from 3 tertiary academic centers: Memorial Sloan Kettering Cancer Center (United States), VU University Medical Center (The Netherlands), and Austin Health (Australia). Each site opened its own study and managed the regulatory requirements specific to each institution and country. The trials, by prospective intent, were to collect and combine data under a predefined statistical plan. The lead site (Memorial Sloan Kettering) holds a U.S. Food and Drug Administration Investigational New Drug application for 18F-FDHT (#66115) and provided letters of cross-reference to facilitate submission for regulatory approval for the other sites. The institutional review boards of each center approved the study, and all patients provided written informed consent before inclusion. The clinicaltrials.gov identifier is NCT00588185 (this number applies only to Memorial Sloan Kettering, the only U.S.-based site).
Patient Eligibility and Study Design
Eligibility criteria included pathologically proven mCRPC, castrate serum testosterone (≤50 ng/dL), 4 wk or more since patients’ last anticancer pharmacologic therapy, and progressive disease based on a rise in prostate-specific antigen or on RECIST 1.1 imaging evidence of progressive disease or 2 or more new metastatic lesions on bone scan not attributable to the flair phenomenon.
Patients without surgical or medical castration remained on androgen depletion therapy with gonadotropin-releasing hormone analogs/inhibitors. Patients on enzalutamide or other antiandrogens within 4 wk were excluded, as this therapy directly competes with 18F-FDHT uptake. The design included means to evaluate the effect of time between the test and retest 18F-FDHT injections on the uptake measurements. Up to 3 cohorts were planned for test–retest scans (cohort 1: days 1 and 2; cohort 2: days 1 and 8; and cohort 3: days 1 and 22). Initially, patients would be studied in cohort 1. If unstable test–retest 18F-FDHT uptake (defined as a relative difference > 0.15) was present in 5 or more patients at any time, the study would proceed to the subsequent cohort. However, as a relative difference greater than 0.15 was not observed in 5 or more patients in cohort 1, there was no indication to proceed to subsequent cohorts, and all patients underwent 18F-FDHT PET/CT scans on 2 consecutive days.
Image Acquisition
Images were acquired using a GE690 or GE710 (GE Healthcare) or Gemini TF64 or Ingenuity TF128 (both from Philips) PET/CT scanner. For each scan, a low-dose CT scan (120–140 kV, 80 mA) was obtained, followed by a dynamic 30-min PET scan over the thorax after intravenous 18F-FDHT administration. All scans were corrected for decay, scatter, random coincidences, and photon attenuation. During the dynamic scans, 3 intravenous samples were drawn at 5, 10, and 30 min after injection. Whole-blood activity concentration, plasma activity concentration, and parent and metabolite fractions (by high-pressure liquid chromatography) of 18F-FDHT were measured. A whole-body PET/CT (mid thigh to mid skull) followed, starting approximately 45 min after injection. A whole-body low-dose CT scan (120–140 kV, 80 mA) was acquired with a section thickness and reconstruction interval of 5 mm and pitch of 0.75–1.5. No oral or intravenous contrast material was administered.
Data Management and Analysis
The Clinical Trials Network from the Society of Nuclear Medicine and Molecular Imaging provided both centralized data management and access to Imagys®, a web-based Imaging Clinical Trial management system by Keosys, for secure uploading, storage, downloading, and analysis of images.
All images were evaluated independently by a dually trained radiologist/nuclear medicine physician and a nuclear medicine resident (8 and 3 y experience in PET/CT, respectively). Lesions were considered suggestive of metastases when uptake was visually higher than blood-pool activity measured in the thoracic aorta or background tissue specific to the site of the lesion and separate from known physiologic uptake (blood pool, biliary, urinary, and gastrointestinal tracts). Lesion type (bone, nodal, or other soft tissue) and anatomic site (grouped into 11 regions for bone, 11 regions for nodes, and 10 regions for other soft tissue) were recorded (Supplemental Fig. 1; supplemental materials are available at http://jnm.snmjournals.org). The most visually prominent 18F-FDHT–avid lesion in each predefined anatomic region was delineated and a volume of interest generated semiautomatically using a 50% isocontour of SUVmax corrected for local background. The following 18F-FDHT uptake metrics were recorded: SUVmax, SUVpeak (1.2 cm3 spheric region positioned within the lesion to maximize its mean value), and SUVmean (all voxels within the lesion) corrected for body weight. Additionally, these metrics were normalized to the area under the parent plasma time–activity concentration curve (AUC) at 30 min (SUVAUCpp) (14). Androgen receptor–positive tumor volume ([ARTV] derived using a 50% threshold of SUVmax corrected for local background) and total lesion uptake ([TLU] defined as SUVmean × ARTV) of 18F-FDHT were calculated.
Statistical Analysis
Repeatability and interobserver reproducibility were determined by calculating the relative absolute difference in 18F-FDHT uptake metrics between the test and retest scans, and between the values of the uptake metrics measured by the 2 readers. The relative absolute difference was computed as:If no lesion was identified in a patient, the absolute change was set to zero but was not considered when calculating quantitative repeatability coefficients (RCs). The RC was calculated as 1.96*SD of the relative absolute differences per lesion and per patient for all uptake metrics. Normality was evaluated visually using a quantile-quantile plot and histogram analyses. Significance of differences in uptake metrics between the 2 scans and between the 2 readers was assessed using a paired t test. To assess differences in RCs, a Levene test was performed; differences were deemed significant if the P value was less than 0.05. Linear regression analyses, intraclass correlation coefficients (ICCs), and Bland–Altman plots were used to evaluate repeatability. Additionally, the coefficient of variation (COV) and ICC were used to investigate interobserver reproducibility.
A Levene test was performed to assess the effect of various lesion selection strategies on repeatability and reproducibility: lesions of 4.2 mL or more (diameter ≥ 2 cm), SUV of 4.0 g/mL or more, and up to the 5 most radiotracer-avid lesions, as suggested by the PERCIST guidelines (15). In addition, the uptake values of these 5 individual target lesions were averaged per patient to obtain mean uptake values. A post hoc linear regression analysis was performed to evaluate the influence of prostate-specific antigen levels, Gleason score, weight, and differences in total injected activity and uptake time between both scans on a per-patient basis. On the basis of previous reports on repeatability of 18F-FDG uptake in malignant tumors, 30% or less variability between the test and retest was considered acceptable (15,16). All statistical analyses were performed using SPSS 22.0 (SPSS).
Additional details on study design, image acquisition and processing, radio–high-performance liquid chromatography, and analysis of 18F-FDHT metabolism are available on request.
RESULTS
Thirty-two patients were included. The minimum number of paired evaluations per patient (i.e., per the anatomic regions described in the “Materials and Methods” section) was 1; the maximum was 12. Five patients were excluded from the RC calculations, because no lesions were detected on PET. Overall, 27 patients with a total of 140 18F-FDHT–avid lesions were evaluated. No significant differences in patient characteristics were observed between the test and retest scans. The total injected activities at center 2 were significantly lower than those of centers 1 and 3; however, no systematic differences were found in the SUVs from centers 1 and 3 (Tables 1 and 2).
Repeatability
The best repeatability of 18F-FDHT PET/CT uptake metrics was found for SUV, where the predefined threshold of variability of 30% or less was met (Table 3; Fig. 1). No significant differences in variability were found between SUVmax, SUVmean, and SUVpeak, and correlations between the test and retest scans were strong (R2 ≥ 0.92; ICC ≥ 0.97). Bland–Altman graphs did not show skewness of the data (Figs. 2 and 3). The RCs of the overall SUV metrics ranged from 21.3% (SUVpeak) to 24.6% (SUVmax). Significantly smaller RCs were found between SUVmean and SUVpeak at center 3 and those of centers 1 and 2 (P = 0.03–0.04). Only for SUVmax, the variability was significantly less in soft tissue versus bone lesions (RCs 18.2% vs. 26.1%; P = 0.04). Repeatability of the uptake metrics showed a trend toward dependency on lesion size, but not on absolute SUVs (Fig. 4).
Test and retest TLU and ARTV values also showed good correlation (R2 and ICC ≥ 0.97), although variability was significantly larger than for SUV, and the predefined variability threshold of 30% or less was not met (RCs > 46.4%) (Fig. 5). Mean TLU was significantly larger in patients from center 2, yet variability was only significantly lower than that of center 1 (40.5 vs. 56.0%; P = 0.02). Even when evaluated on a per-region basis, RCs remained significantly higher compared with those from the SUV metrics and were not influenced by lesion type.
Assessing variability of the 18F-FDHT uptake metrics on a per-patient basis improved repeatability of all uptake metrics (Table 3; Fig. 6). RCs of SUV decreased 6% on average, which was significant for SUVmax and SUVmean. The improvement of volumetric measures was larger, with changes in RCs of TLU and ARTV being 12.7 and 23.1%, respectively. This was mainly caused by a large decrease in variability of ARTV of centers 2 and 3 after averaging the data. Prostate-specific antigen level, Gleason score, weight, and age did not affect repeatability, nor did differences in total injected activity or uptake time after injection between both scans (R2: < 0.08) (Fig. 7).
Normalization to Parent Plasma Input Curve
Adequate blood samples were available from 21 of the 27 patients with a total of 103 lesions. Normalizing SUV to AUC significantly decreased the overall repeatability on both lesion and patient bases for centers 1 and 3 (Tables 3 and 4). This was mainly due to large differences (>50%) in whole-blood activity concentrations between samples in the test and retest samples from 2 patients. When these outliers were removed, the repeatability for centers 1 and 3 improved and only a slight change in RCs on an overall lesional basis was observed after normalization (SUVmax: 29.9%; SUVmean: 30.3%; SUVpeak: 21.6%). This was also seen for RCs on a per-patient level (SUVmax: 25.6%; SUVmean: 23.8%; and SUVpeak: 16.3%).
Lesion Selection
Inclusion of up to the 5 most avid lesions per patient did not significantly affect repeatability for any of the uptake metrics. If these lesions were assessed on a per-patient basis, RCs were similar to those before lesion selection. Likewise, only including the single most avid lesion, lesions of 4.2 mL or more, or lesions with an SUV of 4 g/mL or more did not significantly affect repeatability. Decrease in RCs ranged from 0% to 6.5% for all uptake metrics.
Reproducibility
Reproducibility between readers was excellent for SUVmax and SUVpeak, with discrepancies in measurements between the readers found in only 2 of 300 measurements and 12 of 140 lesions for SUVmax and SUVpeak, respectively. Lesions showing discrepancies were close to regions of high physiologic uptake (e.g., liver, urinary tract, or vascular structures) or showed diffuse uptake (e.g., diffuse disease in the pelvis). Both metrics showed high reproducibility (ICC: 1.00) and a low COV (≤0.20%).
The remaining semiquantitative uptake measures were more dependent on volume-of-interest definition. The correlation between both readers for SUVmean was still excellent (ICC: 0.99), but the variation was significantly higher (COV: 2.3%). TLU and ARTV were less reproducible than all SUV metrics (COV: 10.8% and 10.4%, respectively), yet the ICCs remained above 0.98.
DISCUSSION
This multicenter prospective study assessed repeatability and reproducibility of 18F-FDHT, both of which are key components of the tracer’s analytic validation as a clinical biomarker. Repeatability of SUV metrics was superior to that of volumetric metrics, with repeatability coefficients ranging between 16.4% and 17.8% on a patient basis and 21.3%–24.6% on a region basis. As a necessary step in biomarker development, this study demonstrated the feasibility of 18F-FDHT PET/CT imaging in a multiinstitutional setting and satisfied the requirement to evaluate the biomarker’s test–retest repeatability (17). In the current era of AR-directed mCRPC drug development, such biomarkers can serve as a pharmacodynamic, a prognostic, and a response indicator (6–9).
Most studies on test–retest repeatability in PET/CT have evaluated 18F-FDG uptake. The PERCIST guidelines recommend a more than 30% change in SUV to define a meaningful change in clinical status for both disease response and progression (15). Weber et al. evaluated 18F-FDG PET/CT imaging in 74 patients with non–small cell lung cancer in a multiinstitutional (n = 9) clinical trial and reported thresholds of 28%/32% decrease and 39%/47% increase in SUVmax and SUVpeak, respectively, to be most indicative of actual therapeutic effects (16). However, multiple technical and logistic factors can affect these measurements, including differences in volume of interest, delineation, magnitude of uptake metrics, and uptake time after intravenous injection, as well as difficulties related to adherence to protocol design in a multiinstitutional setting (18,19). Similar studies in patients with prostate cancer have been conducted with other radiotracers. Variation coefficients of 14% and 7% were reported on 18F-NaF PET/CT in patients with mCRPC for SUVmax and SUVmean, respectively (20). In a study using 18F-fluoromethylcholine in patients with mCRPC, repeatability coefficients ranging between 22% and 26% were reported for different SUV metrics (21). Additionally, this study also reported that RCs of metabolically active tumor volume and TLU were significantly larger than those for SUV (36% and 33%, respectively). Other studies also using SUVmax-based thresholds showed similar results (22,23), yet a significant decrease in repeatability was seen when only lesions of 4.2 mL or less were included in the analysis. Studies have also shown decreased variability when evaluating repeatability on a per-patient (as opposed to a per-lesion) basis (14,21).
Normalization to Parent Plasma Input Curve
Two studies have shown a correlation (R2: 0.6–0.7) between nonlinear regression analysis of dynamic 18F-FDHT data and SUV (5,14). Additionally, preliminary results showed a near-perfect correlation when the SUV was normalized to the AUC (R2: 0.99). A potential advantage of normalization to the parent plasma input curves is that the uptake metrics are corrected for any treatment-induced or other changes in the radiotracer’s metabolism, albeit at the expense of an additional dynamic PET scan, venous blood samples, and metabolite analysis. Moreover, including an additional variable into uptake metric calculations can increase uncertainty (14,21), although in the present study, SUVAUCpp did not significantly affect overall variability of any of the SUV metrics on a lesion level. One outlier was seen with unexplained large differences in whole-blood activity concentrations between test and retest scans, which could not be accounted for by sample measurement errors, suggesting the need for caution in the case of response assessment.
Our study had limitations. To overcome possible confounders in our study, all lesions were delineated by 2 independent readers. For SUVmax and SUVpeak, reproducibility was nearly perfect, and differences in SUVmean between readers were small. Moreover, differences in uptake time between the test and retest scans did not affect repeatability, suggesting that the influence of this factor was minor. However, repeatability data from 2 readers are insufficient to make strong statements about agreement across a larger pool of readers and will require validation. Patients with castration-resistant prostate cancer often present with numerous metastatic lesions and, ideally, each lesion should be delineated and assessed. However, this is impractical in routine clinical scenarios and therefore we predefined anatomic regions. Yet, this still resulted in 10 or more evaluable regions in 20% of the patients. Several other (simpler) lesion selection criteria were also investigated; those regions did not result in a change in variability.
CONCLUSION
Metrics derived from 18F-FDHT PET/CT show high repeatability and interobserver reproducibility. Among 18F-FDHT uptake metrics, SUV had the best repeatability, and although ARTV and TLU showed good correlation, variability was higher.
DISCLOSURE
This study was funded by a Movember Foundation Global Action Plan award. Memorial Sloan Kettering Cancer Center is supported by NIH/NCI Cancer Center Support Grant P30 CA008748. No other potential conflict of interest relevant to this article was reported.
Footnotes
Published online Apr. 6, 2018.
- © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication December 4, 2017.
- Accepted for publication January 20, 2018.