Visual Abstract
Abstract
The purpose of this study was to determine the negative predictive value (NPV) of a 12- to 14-wk posttreatment PET/CT for 2-y progression-free survival (PFS) and locoregional control (LRC) in patients with p16-positive locoregionally advanced oropharyngeal cancer (LA-OPC). Study was a secondary endpoint in NRG-HN002, a noncomparative phase II trial in p16-positive LA-OPC, stage T1-T2, N1-N2b or T3, N0-N2b, and ≤10 pack-year smoking. Patients were randomized in a 1:1 ratio to reduced-dose intensity-modulated radiotherapy (IMRT) with or without cisplatin. Methods: PET/CT scans were reviewed centrally. Tumor response evaluations for the primary site, right neck, and left neck were performed using a 5-point ordinal scale (Hopkins criteria). Overall scores were then assigned as negative, positive, or indeterminate. Patients with a negative score for all 3 evaluation sites were given an overall score of negative. The hypotheses were NPV for PFS and LRC at 2-y posttreatment ≤ 90% versus >90% (1-sided P value, 0.10). Results: A total of 316 patients were enrolled, of whom 306 were randomized and eligible. Of these, 131 (42.8%) patients consented to a posttherapy PET/CT, and 117 (89.3%) patients were eligible for PET/CT analysis. The median time from the end of treatment to PET/CT scan was 94 d (range, 52–139 d). Estimated 2-y PFS and LRC rates in the analysis subgroup were 91.3% (95% CI, 84.6, 95.8%) and 93.8% (95% CI, 87.6, 97.5%), respectively. Posttreatment scans were negative for residual tumor for 115 patients (98.3%) and positive for 2 patients (1.7%). NPV for 2-y PFS was 92.0% (90% lower confidence bound [LCB] 87.7%; P = 0.30) and for LRC was 94.5% (90% LCB 90.6%; P = 0.07). Conclusion: In the context of deintensification with reduced-dose radiation, the NPV of a 12- to 14-wk posttherapy PET/CT for 2-y LRC is estimated to be >90%, similar to that reported for patients receiving standard chemoradiation. However, there is insufficient evidence to conclude that the NPV is >90% for PFS.
Head and neck squamous cell cancer (HNSCC) is the ninth most common malignant tumor worldwide, responsible for about 2% of all cancer-related deaths (1). Human papillomavirus (HPV)–associated HNSCC is rising in incidence and affects a younger population (2,3). This subgroup of patients harbors HPV in their tumor cells, predominantly HPV-16, and the tumors occur mostly in the oropharynx. The prognosis for these patients is better, with overall survival (OS) at 3 y being about 82% in locally advanced HPV-positive HNSCC (4). Standard therapy for locoregionally advanced oropharyngeal SCC (OPSCC) is a combination of 70-Gy radiation therapy (RT) and concurrent platinum chemotherapy (5). Because of the better survival outcomes in the HPV-associated OPSCC patient population and to reduce treatment-related short- and long-term toxicities, various deintensification treatment strategies are currently being explored (6,7) for patients with HPV-associated OPSCC.
18F-FDG PET/CT has been shown to be a valuable imaging test in assessing treatment response in HNSCC. In a phase III randomized controlled study (n = 564), an 18F-FDG PET/CT–based surveillance strategy was noninferior in survival and also cost-effective when compared with routine neck dissection (8), after standard chemoradiation therapy. Therefore, it is recommended that 18F-FDG PET/CT be performed, usually about 12 wk or later from completion of chemoradiation therapy (9), to minimize false-positive results from radiation-induced inflammation.
The 5-point Hopkins criteria for posttherapy 18F-FDG PET/CT interpretation was established and validated to standardize the interpretation and reduce variability (10). Its reported accuracy is 86.4% (95% CI, 79.3%, 91.3%) with a negative predictive value (NPV) of 92.1% (95% CI, 86.9%, 95.3%) (9). The Hopkins scale is a standardized qualitative interpretation method designed for routine clinical practice. It has been recently shown to be equivalent in its performance compared with a more complex quantitative assessment method (11). It also predicts survival outcomes, both OS and progression-free survival (PFS) in HNSCC patients (9,10).
The Hopkins criteria was internally and externally validated (9,10) using mixed patient populations of HPV-positive and HPV-negative HNSCC. This study evaluates its performance metrics in HPV-positive, locally advanced oropharyngeal cancer patients receiving deintensified therapy. Specifically, we determine the NPV of 12- to 14-wk posttreatment 18F-FDG PET/CT for PFS and locoregional control (LRC) at 2 y in this population.
MATERIALS AND METHODS
NRG-HN002 is a multiinstitutional, noncomparative randomized phase II clinical trial (ClinicalTrials.gov identifier: NCT02254278). The trial determined the acceptability of 2 curative-intent strategies incorporating reduced-dose RT with or without cisplatin. This trial was designed to select the arm(s) meeting PFS (primary objective) and swallowing-related quality of life criteria (as measured by the M.D. Anderson Dysphagia Inventory [MDADI]; co-primary objective) for advancement to a definitive trial. The trial design, patients, inclusion/exclusion criteria, trial oversight, and definitions have already been described (6).
18F-FDG PET/CT Substudy and Patients
All patients eligible for NRG-HN002 were offered to participate in an optional study to assess treatment response at 2 y based on 12- to 14-wk posttreatment 18F-FDG PET/CT scans. Of the 306 eligible patients for the parent study, 131 consented to participate. Of these, 117 patients received protocol treatment and had acceptable-quality scans and thus were eligible for analysis. Fourteen patients were excluded from these analyses (1 did not receive protocol treatment, 1 had the scan in the wrong format, and 12 had no scan).
18F-FDG PET/CT Imaging
All sites were instructed to follow an 18F-FDG PET/CT imaging protocol. A serum glucose level of < 200 mg/dL before the study, an uptake time of 60 ± 10 min, and dedicated head and neck (orbits to the upper thorax) and whole-body (orbits to upper thigh) acquisitions were obtained. Recommended PET acquisition parameters were 6 bed positions and an acquisition of 2–5 min per bed position. The dedicated head and neck PET/CT typically followed the body examination. It included 2 bed positions (6 min per bed position), and the images were reconstructed into a 30-cm field of view with a 256 × 256 matrix. The recommended acquisition parameters for the low-dose CT scan were as follows: kV = 120; effective mAs = 90–150; gantry rotation time < 0.5 s; maximum reconstructed slice width = 2.5 mm (overlap acceptable); standard reconstruction algorithm, maximum reconstruction diameter = 30 cm; and without iodinated contrast. The PET/CT data were corrected for dead time, scatter, randoms, and attenuation using standard algorithms provided by the scanner manufacturers. For the dedicated head and neck views, a postprocessing filter with a full-width at half maximum in the range of 5 mm was recommended.
18F-FDG PET/CT Image Interpretation: Hopkins Criteria
PET/CT scans were reviewed both centrally and locally by participating institutions. Tumor response evaluations for the primary site, right neck, and left neck were performed using a 5-point ordinal scale (Hopkins criteria) (10): score 1—definite complete metabolic response; score 2—likely complete metabolic response; score 3—likely inflammatory; score 4—likely residual metabolic disease; and score 5—definite residual metabolic disease. A score of 1 or 2 was interpreted as negative, 3 as indeterminate, and 4 or 5 as positive. An overall score was assigned using this collapsed 3-point categorization, with the highest score at any anatomic site determining the overall score.
In the central review, if at least 1 evaluation site was positive, the assigned overall score was positive. Patients with a negative score for all 3 evaluation sites were given an overall score of negative. This is a visual, qualitative analysis using internal jugular vein and liver uptake as internal controls.
In the local review, 6 patients had at least 1 evaluation site as positive and were assigned an overall positive score; 1 patient had a site score of positive and was given an overall score of indeterminate. Seven patients had site and overall scores of indeterminate; 3 patients had a site score of indeterminate and were ultimately given an overall score of negative. Patients with a negative score for all 3 evaluation sites were all given an overall score of negative.
Statistical Analysis
Distributions of patients’ characteristics for those who did and did not consent to PET/CT imaging were compared using the χ2 test with a significance level of 0.05. Hazard ratios (HRs) for PFS and locoregional failure (LRF) for these 2 subgroups were estimated using the Cox proportional hazards models. Primary analyses included eligible patients who consented to PET/CT imaging and had a posttreatment PET/CT scan submitted for analysis regardless of timing. Sensitivity analyses included patients with scans 10–16 wk after the end of RT. Overall central scan review results were used in the primary analyses of the NPV. The level of agreement between overall local and central PET/CT reads on the 3-point scale was assessed using percent agreement and Brennan–Prediger’s and Gwet’s coefficients. The level of agreement for primary site, right neck, and left neck scores was measured using the weighted versions of the same coefficients with linear weights to account for different levels of disagreement between categories of Hopkins criteria.
The primary purpose of analyzing 18F-FDG PET/CT in NRG-HN002 was to determine the NPV of 12- to 14-wk posttherapy 18F-FDG PET/CT for 2-y PFS and 2-y LRC. Failure for PFS endpoint was defined as local, regional, or distant progression or death due to any cause; rates were calculated by the Kaplan–Meier method. The LRF endpoint was defined as local or regional progression, salvage surgery of the primary tumor with tumor present or unknown, salvage neck dissection with tumor present or unknown > 20 wk after the end of RT, death due to study cancer without documented progression, or death due to unknown causes without documented progression; rates were calculated by the cumulative incidence method.
NPV was calculated as the proportion of PET/CT-negative patients who remained progression-free at 2 y and, separately, for those who maintained LRC (remained free of LRF) at 2 y. The binomial NPV estimates and exact CIs were calculated. The null hypothesis of NPV ≤ 90% for PFS was tested against the alternative of NPV > 90% with a 1-sided binomial test at the 0.10 level. The power for these hypotheses was calculated under the alternative hypothesis of 95% NPV. With an estimated 140 available scans, the statistical power to reject the null hypothesis of NPV ≤ 90% was 76% per protocol-specified design.
RESULTS
Patients
NRG-HN002 opened to accrual on October 27, 2014, and completed accrual on February 7, 2017, with 316 patients enrolled, of whom 308 were randomized (306 eligible). A total of 117 patients consented and were eligible for PET/CT analysis (Supplemental Fig. 1; supplemental materials are available at http://jnm.snmjournals.org).
Supplemental Table 1 summarizes patient and tumor characteristics by PET/CT consent status. Overall, 131 eligible patients (42.8%) consented to the posttherapy PET/CT exam. The consent rate was comparable between arms. No significant differences in patient and tumor characteristics were found between consent status groups.
Supplemental Figure 2 summarizes the PFS analysis by consent status. The estimated HR (no consent vs. consent) was 1.77 (95% CI, 0.91, 3.41). Supplemental Figure 3 summarizes the LRF analysis by consent status; the estimated HR (no consent vs. consent) was 1.41 (95% CI, 0.65, 3.09).
Patient and Tumor Characteristics
Of 131 patients who consented to PET/CT imaging, 117 (89.3%) were eligible for analysis. Supplemental Table 2 shows patient and tumor characteristics for these patients. The median age of patients was 62 y (minimum–maximum, 39–84 y); 87.2% of patients were male, 90.6% were white, 81.2% had a Zubrod performance status 0, 54.7% had tonsil primary site, 64.1% had T2–3 disease, 76.9% had N2 disease, and 79.5% had bilateral RT planning. The mean time from the end of treatment to the PET/CT scan was 13.6 wk (SD = 1.9 wk; range and interquartile range, 7.4–19.8 and 12.7–14.4 wk, respectively).
Study Endpoints
PET/CT Central Review
Supplemental Table 3 summarizes the PET/CT scan central review results. Three patients had a site score of indeterminate but were ultimately given an overall score of negative. Overall, posttreatment scans for 115 of 117 patients (98.3%) were negative for residual tumor, and 2 (1.7%) were positive for residual tumor. For the primary site, posttreatment scans for 113 patients (96.6%) had “definite complete metabolic response”; 1 patient (0.9%) had “likely complete metabolic response”; 2 patients (1.7%) were assessed as “likely inflammatory”; 1 patient (0.9%) had “definite residual metabolic disease.” Similar results were found for the right and left neck (Supplemental Table 3).
NPV of PET/CT for 2-y PFS
Table 1 summarizes the results for NPV for PFS at 2 y using central review results. Overall, the NPV for 2-y PFS was 92.0% (90% lower confidence bound [LCB], 87.7%; 95% CI, 85.4%, 96.3%) with P = 0.3 not rejecting the null hypothesis of the NPV for 2-y PFS ≤ 90%. With P > 0.10, these results indicate that there is not enough evidence to conclude that the NPV of PET/CT for 2-y PFS is > 90%, but were able to (with a 90% confidence) rule out an NPV below 87.7%. Comparable NPV results were found by treatment arm. On the intensity-modulated radiotherapy (IMRT) + cisplatin and IMRT arms, 57 and 58 patients were evaluable for NPV for PFS, respectively; 1 patient on each arm was censored for PFS. For patients with an overall PET/CT score of “positive for residual tumor,” 1 patient (50.0%) had a failure for 2-y PFS, and 1 patient (50.0%) did not have failure for 2-y PFS (Table 1).
NPV per Central Review for 2-Year PFS
A sensitivity analysis to estimate the NPV was completed using evaluable patients with PET/CT scans completed 10–16 wk after RT. A total of 104 patients were included, with a resulting overall NPV for 2-y PFS equal to 92.2% (90% LCB, 87.6%; 95% CI, 85.1%, 96.6%) and P = 0.3. Again, given P > 0.10, there is not enough evidence to conclude that NPV of PET/CT for 2-y PFS is > 90% (Supplemental Table 4).
NPV of PET/CT for 2-Y LRC
Table 2 summarizes the results for NPV for LRC at 2 y using central review results. The NPV for 2-y LRC was 94.5% (90% LCB, 90.6%) with P = 0.07, rejecting the null hypothesis of the NPV for 2-y LRC ≤ 90% in favor of the alternative hypothesis of NPV > 90%. A 90% LCB for the NPV for 2-y LRC was 90.6%, a number above the hypothesized (null) NPV of 90% (95% CI, 88.5%, 98.0%). The NPV for 2-y LRC for the IMRT + cisplatin arm was 94.6% (90% LCB 88.5%). The NPV for 2-y LRC for the IMRT arm was 94.4% (90% LCB 88.0%). Results by the treatment arm are also shown in Table 2. Of the 58 patients on the IMRT + cisplatin arm eligible for PET/CT analysis, 56 were evaluable for NPV for LRC; 2 patients were censored for LRC before the 2-y time point. Of the 59 patients on the IMRT arm eligible for PET/CT analysis, 56 were evaluable for NPV for LRC; 3 patients were censored for LRC before the 2-y time point. For patients with an overall PET/CT score of “positive for residual tumor,” 1 patient (50.0%) had failure for 2-y LRC and 1 patient (50.0%) did not have failure for 2-y LRC.
NPV per Central Review for 2-Year LRC
A sensitivity analysis to estimate the NPV was completed using evaluable patients with PET/CT scans completed 10–16 wk after RT. A total of 101 patients were included, with a resulting overall NPV for 2-y LRC equal to 94.9% (90% LCB, 90.8%; 95% CI, 88.6%, 98.3%) and P = 0.06, again rejecting the null hypothesis of the NPV for 2-y LRC ≤ 90% in favor of the alternative hypothesis of NPV > 90% (1-sided α-level 0.10) (Supplemental Table 5).
PET/CT Local Assessment
When local assessment results were used, the NPV for 2-y PFS was 91.8% (90% LCB, 86.5%; 95% CI, 83.8%, 96.6%; P = 0.4 > 0.10). The NPV for 2-y LRC was 95.1% (90% LCB, 90.5%; 95% CI, 88.0%, 98.7%; P = 0.08 < 0.10). Therefore, there is evidence that the NPV > 90% for 2-y LRC. Results from the sensitivity analysis, using only scans completed 10–16 wk after RT, are similar for both endpoints (Supplemental Tables 6 and 7).
Local and central assessments by neck site and overall are shown in Supplemental Table 8. The percent agreement and Brennan–Prediger’s and Gwet’s agreement coefficients between overall local and central interpretation were 0.87 (95% CI, 0.80, 0.94), 0.80 (95% CI, 0.70, 0.91), and 0.86 (95% CI, 0.78, 0.94), respectively (Supplemental Table 8). The agreement coefficient estimates for primary site and right and left neck are also shown in Supplemental Table 8. These values suggest substantial agreement between local and central PET/CT interpretation for overall, primary site, left and right neck. Disagreements mainly consisted of patients who were classified with a definite metabolic disease by central reviews but were assigned a likely complete metabolic response or likely inflammatory by local assessments.
DISCUSSION
In this study, testing a reduced-dose of RT for patients with p16-positive, T1-T2 N1-N2b M0, or T3 N0-N2b M0 OPSCC (seventh edition staging) with ≤ 10 pack-years of smoking, we estimated the performance characteristics of the Hopkins criteria for the predictive ability of 12- to 14-wk posttreatment 18F-FDG PET/CT for patient outcomes at 2 y. On the basis of the central review, most posttreatment scans (98.3%) were negative for residual tumor, and the NPV for LRC was 94.5% and PFS was 92.0%. Similar NPVs were obtained on the basis of local site analysis.
The study population of this trial had a distinctly more favorable outcome profile than the study population of the original development and internal (10) and subsequent external validation (9) of the Hopkins criteria for interpretation of the 12- to 14-wk posttreatment 18F-FDG PET/CT. The study population from the original derivation study (n = 214) included many subsites of HNSCC patients (oropharynx 63.1%, oral cavity 5.1%, larynx 18.7%, and other sites 13.1%; 57.5% HPV-positive) who had higher progression and death rates (median follow up of 27 mo; 17.7% died and 29.4% had progression). The external validation study (ECLYPS) had a study population similar to the original derivation study, including various subsites (oropharynx 54.7%, oral cavity 6.3%, larynx 16.8%, and other sites 22.2%; 29.6% HPV-positive) and poorer outcome rates (13.6% died and LRF 20.8% at 2 y) (9). Compared with these 2 study populations, the NRG-HN002 population analyzed in this substudy included only patients with HPV-positive oropharyngeal cancer, and 2-y PFS was 87.6% or above and OS was 96.7% or above. Hence, this trial provides the performance characteristic (NPV) of the Hopkins criteria for posttreatment 18F-FDG PET/CT in a favorable deintensified outcome group.
One of the Hopkins criteria characteristics is decreasing the number of intermediate readings and uncertainty about inflammatory uptake. The number of patients with intermediate score (score 3, likely inflammatory) was low in this study (n = 1 for left neck, n = 0 for right neck, and n = 2 for the primary site), which is similar to that in the prior studies (9,12–14). This is most likely due to the standardized qualitative reads and subsiding radiation-induced inflammation by 12–14 wk after therapy. Compared with other interpretation criteria (such as NI-RADS, Porceddu, Deauville), the Hopkins criteria has been demonstrated to reduce the intermediate interpretation to the lowest (14). In addition, unlike the prior studies, the number of patients with scores representing residual disease is extremely low (1.7%) in this study, compared with the other studies (9,10), due to the favorable HPV (2) oropharyngeal SCC population in this study who responded well for the treatment.
This study establishes the value of Hopkins criteria in a multicenter clinical trial setting. The advantage of standardized qualitative interpretation criteria is the ease and rapid deployment in a clinical practice setting (15) while maintaining similar accuracy of semiquantitative interpretation methods such as PERCIST (16) and other methods (11), which require more stringent standard methods of performing the scans and complex analyses. The analysis suggests substantial agreement between local and central interpretation for overall, primary site, left, and right neck interpretation. In future studies, the level of agreement could be further optimized by including a training program or training set for site reads. Further, the added value of performing a PET/CT 3 mo after therapy in a favorable population could be established by performing a clinical examination and therapy response judgment first, before performing a PET/CT, then comparing these results or revealing them to the clinical team and estimating the final clinical judgment at 3 mo after therapy. This would have demonstrated the true added value of performing a PET/CT to the clinical judgment, at this time point.
There are limitations to this secondary endpoint analysis of NRG-HN002. First, PET/CT was an optional method for therapy response assessment at the time this study was designed, and the actual sample size was slightly lower than the projected sample size (113 vs. 140 patients). Second, presumably higher risk patients did not opt-in for PET/CT. However, this apparent finding was not statistically significant and was not explained by differences in tumor and patient characteristics between participants and nonparticipants in the PET/CT substudy. Third, although the protocol specified a posttreatment PET/CT at 12–14 wk, the actual PET/CT time varied around 12–14 wk after treatment. However, the sensitivity analysis, which included PET/CT scans obtained at 10–16 wk after treatment (89%), led to the same conclusions regarding NPV of PET/CT as the analysis using all scans. Fourth, our study was not designed to compare either clinical evaluation or CT imaging versus PET/CT imaging, so we cannot comment on the relative adequacy of various follow-up methods in this low-risk group. Furthermore, NPV estimates close to 2-y PFS and LRC rates suggest that marginal additional information on 2-y posttreatment outcomes is gained using PET/CT around 12–14 wk after treatment. However, as discussed earlier, this result alone should not be used to determine the adequacy of PET/CT in this population. Other metrics such as specificity, sensitivity, and positive predictive value should be considered; none of these metrics can be properly and accurately estimated from this substudy.
CONCLUSION
Within the context of deintensification with reduced-dose radiation, the NPV around 12- to 14-wk posttherapy PET/CT for 2-y LRC is statistically > 90%, similar to that reported for patients receiving standard chemoradiation. However, there is insufficient evidence to conclude that the NPV is > 90% for PFS.
DISCLOSURE
The National Cancer Institute sponsored the study. This project was supported by grants U10CA180868 (NRG Oncology Operations), U10CA180822 (NRG Oncology SDMC), U24CA180803 (IROC), and UG1CA189867 (NRG Oncology NCORP) from the National Cancer Institute (NCI). This project is funded, in part, under a grant with the Pennsylvania Department of Health. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions. Dr. Caudell reports grants and honoraria from Varian Medical Systems and honoraria from Galera. Dr. Chung reports participation on the Advisory Board for Bristol-Myers Squibb, CUE, Sanofi, Mirati, Merck, Brooklyn ImmunoTherapuetics, and Exelixis. Dr. Geiger reports consulting fees from Merck, Exelis, and Regeneron. Dr. Gillison reports grants from Kura Oncology, Agenus, Genocea Biosciences, Inc., Roche, and Bristol Myers Squibb; consulting fees from Kura Oncology, Shattuck Labs, Inc., Nektar Therapeutics, Ispen Biopharmaceuticals Inc., EMD Serono, Inc., Gilead Sciences, Inc., Eisai Medical Research Inc., Istari Oncology, Inc., LLX Solutions, LLC, Onclive, Seagen, Debiopharm, Mirati Therapeutics, Sensei Biotherapeutics, Inc., BioNTech AG, and Coherus; participation on the Advisory Board for Kura DSMB, SQZ Biotech, and BioMimetix; and stock options from Sensei. Dr. Mell reports grants from Radiation Therapy Oncology Group Foundation, Merck Sharp & Dohme Corp., Syneos Health Inc., and NCI and consulting fees from Cel-Sci. Dr. Le reports consulting fees from Nanobiotix, Roche, and Coherus and honoraria from Johns Hopkins, 2021 China International Exchange and Promotive Association for Medical and Health Care (CPAM), Nasopharyngeal Cancer Branch Inaugural Conference and Minimally Invasive Surgery Training Course of Nasopharyngeal Cancer. No other potential conflict of interest relevant to this article was reported.
KEY POINTS
QUESTION: What is the NPV of 12- to 14-wk posttreatment 18F-FDG PET/CT for PFS and LRC at 2 y in HPV-positive, locally advanced oropharyngeal cancer receiving deintensified therapy?
PERTINENT FINDINGS: NRG-HN002 is a multiinstitutional, noncomparative randomized phase II clinical trial (ClinicalTrials.gov identifier: NCT02254278). The primary endpoint of the study was the NPV for PFS and LRC at 2 y. The NPV of around 12- to 14-wk posttherapy PET/CT for 2-y LRC is statistically > 90%, similar to that reported for patients receiving standard chemoradiation. However, there is insufficient evidence to conclude that the NPV is > 90% for PFS.
IMPLICATIONS FOR PATIENT CARE: 18F-FDG PET/CT performed around 12–14 wk after therapy has very high NPV for PFS and LRC in HPV-positive, locally advanced oropharyngeal cancer receiving deintensified therapy.
Footnotes
Published online Sep. 2, 2022.
- © 2023 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication May 25, 2022.
- Revision received August 24, 2022.