Abstract
There has been no established qualitative system of interpretation for therapy response assessment using PET/CT for head and neck cancers. The objective of this study was to validate the Hopkins interpretation system to assess therapy response and survival outcome in head and neck squamous cell cancer patients (HNSCC). Methods: The study included 214 biopsy-proven HNSCC patients who underwent a posttherapy PET/CT study, between 5 and 24 wk after completion of treatment. The median follow-up was 27 mo. PET/CT studies were interpreted by 3 nuclear medicine physicians, independently. The studies were scored using a qualitative 5-point scale, for the primary tumor, for the right and left neck, and for overall assessment. Scores 1, 2, and 3 were considered negative for tumors, and scores 4 and 5 were considered positive for tumors. The Cohen κ coefficient (κ) was calculated to measure interreader agreement. Overall survival (OS) and progression-free survival (PFS) were analyzed by Kaplan–Meier plots with a Mantel–Cox log-rank test and Gehan Breslow Wilcoxon test for comparisons. Results: Of the 214 patients, 175 were men and 39 were women. There was 85.98%, 95.33%, 93.46%, and 87.38% agreement between the readers for overall, left neck, right neck, and primary tumor site response scores, respectively. The corresponding κ coefficients for interreader agreement between readers were, 0.69–0.79, 0.68–0.83, 0.69–0.87, and 0.79–0.86 for overall, left neck, right neck, and primary tumor site response, respectively. The sensitivity, specificity, positive predictive value, negative predictive value, and overall accuracy of the therapy assessment were 68.1%, 92.2%, 71.1%, 91.1%, and 86.9%, respectively. Cox multivariate regression analysis showed human papillomavirus (HPV) status and PET/CT interpretation were the only factors associated with PFS and OS. Among the HPV-positive patients (n = 123), there was a significant difference in PFS (hazard ratio [HR], 0.14; 95% confidence interval, 0.03–0.57; P = 0.0063) and OS (HR, 0.01; 95% confidence interval, 0.00–0.13; P = 0.0006) between the patients who had a score negative for residual tumor versus positive for residual tumor. A similar significant difference was observed in PFS and OS for all patients. There was also a significant difference in the PFS of patients with PET-avid residual disease in one site versus multiple sites in the neck (HR, 0.23; log-rank P = 0.004). Conclusion: The Hopkins 5-point qualitative therapy response interpretation criteria for head and neck PET/CT has substantial interreader agreement and excellent negative predictive value and predicts OS and PFS in patients with HPV-positive HNSCC.
Head and neck cancers have an incidence of 550,000 cases annually worldwide (1). Most head and neck cancers are squamous cell in origin. Well-known risk factors associated with head and neck cancers are tobacco use, alcohol consumption, and human papillomavirus (HPV) infection (2). The incidence of HPV-associated head and neck squamous cell carcinoma (HNSCC) is increasing, and these tumors most commonly arise from the oropharynx (3). Surgery, radiotherapy, or concurrent chemoradiation therapy are accepted standard treatment options in patients with HNSCC. Despite advances in therapeutic techniques, there is a high incidence of locoregional disease recurrence (15%–50%) and a 9% incidence of distant metastases. Early identification of recurrence and assessment of therapy response would highly benefit patients and potentially improve survival (4,5).
PET combined with CT using 18F-FDG is useful in the evaluation of HNSCC, in diagnosis, staging, therapy assessment, and follow-up (6–12). Studies have shown that pretreatment 18F-FDG PET/CT is useful in accurate staging and prediction of disease recurrence and survival (13). Similarly, multiple studies have shown that posttreatment 18F-FDG PET/CT is useful in evaluating treatment response, detecting recurrence (14), and predicting outcomes and survival (15,16). Despite the value of PET/CT in therapy assessment, no established qualitative interpretation criteria for head and neck PET/CT have been published. The objective of this study was to validate interpretation criteria for therapy assessment (Hopkins Criteria) for head and neck PET/CT and establish its accuracy, reader reliability, and the predictive value for survival outcome in patients with HNSCC.
MATERIALS AND METHODS
Eligible Patients and Follow-up
This was a retrospective study performed under a waiver of informed consent approved by the Institutional Review Board. The guidelines of the Health Insurance Portability and Accountability Act were followed. Two hundred fourteen patients (175 men and 39 women; mean age ± SD, 58 ± 10 y) with primary HNSCC who received evaluation and treatment at our institution between May 2000 and January 2013 were included in the study. Histopathology-confirmed HNSCC patients who underwent a baseline 18F-FDG PET/CT and posttherapy assessment 18F-FDG PET/CT study between 5 and 24 wk after completion of radiation therapy or chemoradiotherapy at our institution were included. Patients without a baseline PET/CT study, without prior biopsy-proven recurrence, and with posttreatment PET/CT study later than 24 wk after completion of treatment were excluded. We considered posttreatment 18F-FDG PET/CT performed later than 6 mo from the completion of therapy as follow-up rather than posttherapy assessment. The posttreatment PET/CT studies were ordered at the treating clinician’s discretion as part of therapy assessment.
Image Analysis
Head and Neck PET/CT Interpretation Criteria (Hopkins Criteria)
The studies were scored using a qualitative 5-point scale, for the primary tumor (Fig. 1), for the right neck and left neck (Fig. 2), and for overall assessment. The activity in the internal jugular vein (IJV) was taken as background blood pool for reference. Focal 18F-FDG uptake less than IJV was scored 1, consistent with complete metabolic response. Focal 18F-FDG uptake greater than IJV but less than liver was scored 2, likely complete metabolic response. Diffuse 18F-FDG uptake greater than IJV or liver was scored 3, likely inflammatory changes. Focal 18F-FDG uptake greater than liver was scored 4, likely residual tumor. Focal and intense 18F-FDG uptake greater than liver was scored 5, consistent with residual tumor. A new lesion that was not present in the baseline imaging would be classified as progressive disease (Table 1). Overall assessment is denoted by the overall score, which is the highest score among the scores for the primary tumor and right and left neck. The Hopkins interpretation criteria were based on 18F-FDG PET uptake because previous studies have shown that regardless of the residual lymph node size, the outcome of the patients is determined by residual 18F-FDG uptake.
Definition of Positive and Negative PET/CT Studies
On the basis of the qualitative 5-point scale, the studies were grouped as positive or negative for primary tumor, right neck, left neck, and overall assessment. Scores 1, 2, and 3, which represent complete metabolic response, likely complete metabolic response, and likely postradiation inflammation, respectively, were considered negative for residual tumor. Any score of 4 or 5, which represents likely residual tumor or residual tumor, respectively, at the primary or neck nodes, were considered positive for residual tumor.
Reader Qualifications
The PET/CT studies were retrieved from Johns Hopkins Hospital PACS and were interpreted by 3 board-certified nuclear medicine physicians (reader 1, reader 2, and reader 3), according to the 5 point scoring system (Table 1), independently, using MimVista viewing platform (version 5.2, MimVista Software Inc.). Reader 1 has completed a National Institutes of Health T32 PET/CT research fellowship after nuclear medicine board certification; reader 2 is a current clinical PET/CT fellow, after nuclear medicine board certification; and reader 3 is a current second-year nuclear medicine resident who is already board-certified in nuclear medicine outside the United States.
RESULTS
Patient Characteristics and Follow-up
Two hundred and fourteen patients were included in the study (175 men, 39 women). Eleven patients (5.1%) were below the age of 40 y, 116 patients (54.2%) were between the ages of 41 and 60 y, and 87 patients (40.7%) were above the age of 60 y. A history of smoking was present in 144 patients (67.3%), and a history of alcohol consumption was present in 131 patients (61.2%). HPV was positive in 123 patients (57.5%). The primary site of tumor was classified as oropharynx (63.1%), oral cavity (5.1%), larynx (18.7%), and other sites (13.1%) (Supplemental Table 1; supplemental materials are available at http://jnm.snmjournals.org). The median follow-up of these patients was 27 mo (range, 1–108 mo) after completion of posttherapy assessment PET/CT. All patients were followed up until death or August 2013.
Time Interval of Posttherapy PET/CT
All 214 PET/CT studies were performed between 5 and 24 wk after treatment. The average interval between the date of completion of treatment and the posttreatment 18F-FDG PET/CT study was 12.5 ± 3.6 wk. Of the 214 studies, 19 (8.9%) were performed between 5 and 7 wk, 81 (37.9%) were performed between 8 and 12 wk, and 114 (53.3%) were performed between 13 and 24 wk after completion of treatment.
Reader Classification of PET/CT Studies
On the basis of the scores, 46 of 214 (21.5%), 45 of 214 (21.0%), and 44 of 214 (20.6%) studies were categorized as positive for residual tumor, and 168 of 214 (78.5%), 169 of 214 (79.0%), and 170 of 214 (79.4%) were categorized as negative for residual tumor in the overall assessment. For both the overall assessment and for other sites of residual disease, the final read was assigned if 2 of the 3 readers or all 3 readers agreed on the dichotomous classification (i.e., positive or negative scores). There were 45 of 214 studies (21.0%) assigned positive for residual tumor and 169 of 214 studies (79.0%) assigned negative for residual tumor by the overall assessment in the final read (overall read categorization). There were 31 of 214 (14.5%), 21 of 214 (9.8%), and 15 of 214 (7.0%) studies categorized as positive for residual primary tumor site, the right neck, and the left neck, respectively (site categorization). The Cohen κ coefficient (κ) calculated revealed good interreader agreement. Supplemental Table 2 summarizes the analysis of interreader agreement.
Accuracy of Scoring System
The diagnostic accuracies of the scoring system for each reader and for overall assessment were calculated on the basis of 2 of the 3 or all 3 readers agreeing on the dichotomous classification (positive or negative for tumor). Supplemental Table 3 summarizes the diagnostic accuracy values. According to the overall assessment, 45 studies were considered positive by at least 2 readers or all 3 readers, 12 of 45 studies were confirmed as true-positive by tissue diagnosis, and 20 of 45 studies were confirmed as true-positive by 6-mo clinical follow-up; moreover, there were 5 of 45 studies confirmed as false-positive by tissue diagnosis and 8 of 45 studies by clinical follow-up.
Sixteen (7.5%) of the 214 studies were found to have new lesions in the posttherapy scan. Among the 169 studies that were considered negative by overall assessment (overall categorization), 154 of 169 (91.1%) were confirmed as true-negative by 6-mo clinical follow-up and 15 of 169 (8.9%) as false-negative (7/15 as false-negative by tissue diagnosis and 8/15 by 6-mo clinical follow-up). Among the 214 studies, 44 were scored 3 (likely postradiation inflammation). Thirteen (29.6%) of these studies were performed within 12 wk of completion of treatment, and 31 (70.5%) were performed after 12 wk of completion of treatment. Of the patients who were scored 3, 6 (13.6%) were found to have disease recurrence and 38 (86.4%) were disease free in the 6-mo follow-up period. Examples of studies interpreted as score 3 are illustrated in Supplemental Figures 1 and 2. The scoring system had a sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall accuracy of 68.1%, 92.2%, 71.1%, 91.1%, and 86.9%, respectively (Supplemental Table 3).
Kaplan–Meier Survival Curves: Therapy Assessment Score and Survival Outcome in All Patients (n = 214)
The median follow-up of the study population was 27 mo (range, 1–108 mo) from the date of the PET/CT study, and 38 patients (17.7%) died within the period of the study. Of the 214 patients, 63 were found to have disease progression during the follow-up period from the date of the scan to death or the last patient encounter at our institution. Of these, progression was confirmed in 25 (39.7%) patients by tissue diagnosis and 38 (60.3%) patients by imaging and clinical follow-up. The average duration to progression from the date of the scan was 10.8 ± 11.5 mo.
The median survival of the 45 positive patients was 16 mo (range, 2–64 mo), and 19 patients (42.2%) died within this group. In contrast, in the overall negative PET/CT group, the median survival was 29 mo (range, 1–108 mo), and 18 patients (10.6%) died in this group. The Kaplan–Meier survival analysis showed a significant difference in the overall survival (OS) between patients who were classified negative for residual tumor by the 5-point scale interpretation, compared with those who were scored positive for residual tumor (log-rank, Mantel–Cox P < 0.0001), with a hazard ratio (HR) of 0.046 (95% confidence interval [CI], 0.018–0.120) (Fig. 3). For progression-free survival (PFS), the Kaplan–Meier survival analysis also showed a significant difference between patients who were scored negative for residual tumor, compared with those who were scored positive for residual tumor (log-rank, Mantel–Cox P < 0.0001), with an HR of 0.05 (95% CI, 0.02–0.11) (Fig. 3).
There was no significant difference between the OS of the patients who had residual disease at a single site in the neck (primary site or right side neck or left side neck) and those with multiple sites of residual disease (log-rank, Mantel–Cox P = 0.072), with an HR of 0.38 (95% CI, 0.130–1.091). However, there was a significant difference in the PFS between these 2 groups of patients (log-rank, Mantel–Cox P = 0.004), with an HR of 0.23 (95% CI, 0.085–0.635) (Fig. 4).
Kaplan–Meier Survival Curves: Therapy Assessment Score and Survival Outcome in HPV-Positive Patients (n = 123)
Among the 214 patients included in the study, 123 patients had a positive HPV test. Among these, 16 (13.0%) were positive and 107 (87.0%) were negative for disease by overall assessment score from the final readings of all the 3 readers. There were 23 patients who had progression, and 5 patients died during the follow-up. The Kaplan–Meier survival analysis showed a significant difference in the PFS (HR, 0.14; 95% CI, 0.03–0.57; log-rank test P = 0.0063 and Gehan Breslow Wilcoxon test P = 0.0084) and OS (HR, 0.01; 95% CI, 0.00–0.13; log-rank test P = 0.0006 and Gehan Breslow Wilcoxon test P = 0.0001) between patients who were classified negative for residual tumor by the 5-point scale interpretation, compared with those who were scored positive for residual tumor (Fig. 5).
Added Value of Posttreatment PET/CT Score to Clinical Assessment
We also evaluated whether the posttreatment PET/CT study could add value to the clinical assessment at the time of the study. Of the 214 patients, 205 (95%) underwent a PET/CT study after completion of treatment, as part of routine posttherapy assessment, without clinical suspicion of residual disease, and 9 patients (4.2%) underwent a PET/CT study because of suspected residual disease. PET/CT identified recurrence (confirmed through histopathology or clinical follow-up within 6 mo of the PET/CT study) in 40 of the 205 patients (19.5%) who underwent a PET/CT study without any prior clinical suspicion of disease. Among the 9 patients who underwent PET/CT study to evaluate for clinically suspected residual disease, the positive PET/CT result identified disease in 5 patients (55.6%) and excluded disease in 4 patients (44.4%) who remained disease-free within the 6-mo clinical follow-up.
DISCUSSION
The primary aim of this study was to validate interpretation criteria for therapy response assessment (Hopkins Criteria) for head and neck PET/CT and establish its reader reliability, its accuracy, and the predictive value for PFS and OS outcome in patients with HNSCC, especially among those with HPV-positive HNSCC. Our study showed that Hopkins Criteria for posttherapy response assessment interpretation has substantial interreader agreement, has an NPV of 92%, and predicts OS and PFS in patients with HNSCC. We also demonstrated that the interpretation criteria added value to posttherapy response clinical assessment of patients with HNSCC, at the time of the scan. The interpretation criteria identified residual disease in 19.5% of patients who underwent a routine posttherapy assessment PET/CT, without prior clinical suspicion, and excluded residual disease in 44% of patients who had prior clinical suspicion for residual disease.
Treatment response is an important factor for management planning and prognosis in HNSCC. Clinical examination, conventional imaging methods such as CT and MR imaging, and histopathology examination done after endoscopy are widely used options for therapy response assessment. However, these methods have been reported to have variable diagnostic accuracy (17,18). It has been established that PET/CT has tremendous potential to predict response after treatment and helps in the early detection of residual or recurrent disease, allowing implementation of salvage therapy and predicting complete response, avoiding the need for unnecessary intervention (19,20). Known limitations, however, include low PPVs, attributed to inflammation and posttreatment effects, such as edema, fibrosis, asymmetry, and anatomic distortion. The high NPVs observed in these studies indicate that a negative posttreatment scan is suggestive of absence of active disease, thereby influencing treatment planning (21).
There has been no established interpretation system described in the literature to help readers classify the posttreatment PET/CT findings in a systematic and reproducible manner in patients with HNSCC. Studies in the literature until now have not used specific interpretation criteria to classify the PET/CT findings in HNSCC patients being evaluated for therapy response after systemic treatment. Moeller et al. (22) evaluated 98 patients with head and neck cancer who underwent 18F-FDG PET/CT between 5 and 12 wk after treatment completion. The authors evaluated the 18F-FDG uptake by means of maximum standardized uptake value (SUVmax) measurement and found that a threshold SUVmax of 6.5 and 2.8 for the primary tumor and neck nodes, respectively, had maximum accuracy for predicting treatment failure in these patients. When these values were used, the sensitivity, specificity, PPVs, and NPVs were 70%, 93.7%, 58.3%, 96.1%, and 75%, 76.1%, 27.3%, and 96.2% for the primary tumor and neck nodes, respectively. Gourin et al. (23) evaluated 32 patients with HNSCC who underwent 18F-FDG PET/CT 8–11 wk after completion of chemoradiation. The authors considered a PET/CT study positive if there was significantly more intense 18F-FDG uptake, compared with muscle and vessel background uptake. When an SUVmax cutoff value of 3.0 was used, they found that the sensitivity, specificity, PPV, and NPV of PET/CT to predict residual disease was 40%, 91%, 67%, and 77%, respectively. Compared with these studies performed in the same interval after treatment as our study, our results show similar accuracy for the primary tumor and better accuracy for overall assessment.
Introducing an interpretation system will help clarify uncertain findings encountered routinely, during review of the PET/CT for posttherapy assessment, and will improve standardization of visual interpretations. Our interpretation system was designed with these issues under consideration. The interreader reliability of our interpretation system is similar to the Deauville 5-point scale criteria for patients with lymphoma (24). Barrington et al. found good interreader agreement for Deauville Criteria by determining the Cohen κ coefficient, which was 0.85 (95% CI, 0.74–0.96) for sites with 18F-FDG uptake more than the liver and 0.79 (95% CI, 0.67–0.90) for sites with 18F-FDG uptake higher than the mediastinal uptake (25). Similarly, Biggi et al., in their study involving 260 patients, found that the κ coefficient for interreader agreement ranged from 0.69 to 0.84, implying good interreader agreement for Deauville Criteria for lymphoma therapy assessment. The accuracy of our interpretation system is also similar to the Deauville Criteria. The sensitivity, specificity, PPV, NPV, and accuracy of the Deauville scoring system were 73%, 94%, 73%, 94%, and 91%, respectively (26). The Hopkins interpretation criteria show relatively lower sensitivity and PPVs, likely related to the radiation-induced inflammation, compared with the patient population with lymphoma treated primarily with chemotherapy alone, in whom the Deauville criteria was implemented. The Hopkins scoring system shows relatively high specificity and NPV, which are the most important benefits for patients in clinical practice.
The study results needs to be interpreted within the context of this study. HPV status was not available for all the patients in the study, especially earlier in the study period. The clinical suspicion before each PET/CT study was determined retrospectively from the electronic medical records and imaging records rather than prospectively from clinicians. The PFS data were accurate within the follow-up period for each patient at our institution, but some patients may have had clinical follow-up outside our institution. The mortality data were obtained from a public registry and patient records at our hospital. There may be a lag time between death and public registry update. However, this system has been frequently used in other studies (27,28) to establish survival outcomes.
CONCLUSION
The proposed Hopkins interpretation criteria is a simple qualitative method, has substantial interreader agreement and high NPV, and can predict OS and PFS outcomes in patients with HNSCC. It adds value to posttherapy clinical assessment, by identifying residual disease in patients without prior clinical suspicion and excluding disease in those suspected of residual disease.
DISCLOSURE
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734. No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Jun. 19, 2014.
- © 2014 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
REFERENCES
- Received for publication December 27, 2013.
- Accepted for publication May 19, 2014.