Introduction

Radiation therapy (RT) is the mainstay of treatment for early and locally advanced head and neck squamous cell carcinoma (HNSCC). Improvements in identification of the tumor volume of head and neck tumors using imaging such as 2-[18F] fluoro-2-deoxy-d-glucose (FDG) positron emission tomography with computed tomography (PET/CT) have facilitated radiation treatment planning by improved target volume delineation and more accurate target localization, which is critical for intensity-modulated radiotherapy (IMRT). Sharp dose gradients between the high dose region targeted at the tumor and adjacent low-dose normal tissue regions in IMRT improves the therapeutic ratio between tumor control and radiation related toxicity, although this is reliant on the accurate identification of the tumor extent. Yet despite advanced IMRT techniques and integration of advanced radiological imaging such as PET/CT, locoregional failure still occurs in 30–50 % of locally advanced HNSCC largely within the high dose region [1, 2]. Such variable treatment responses argue for the need to characterize metrics that allow the a priori identification of patients at high risk of treatment failure and death.

Currently the American Joint Committee on Cancer (AJCC) staging, which utilizes a uni-dimensional tumor size, local anatomic invasion, nodal involvement, and presence of metastatic disease, is the most widely accepted and applied prognostic system in cancer [3]. Yet much attention has been called to its weaknesses, specifically in its ability to identify HNSCC patients at high risk of recurrence [46]. PET/CT has been increasingly integrated into diagnostic staging and radiation planning for HNSCC [7, 8], and has been demonstrated to be an accurate and sensitive imaging modality for the post-treatment evaluation of patients with HNSCC compared to clinical exam and CT alone [9, 10]. More, recently PET/CT variables including maximum standardized uptake value (SUVmax) and metabolic tumor volume (MTV) are emerging as potential radiological biomarkers in patients with HNSCC [1117].

Volumetric indices have been proposed to risk stratify patients. Studies have reported that the primary gross tumor volume (GTV) correlates with outcomes and survival in patients with HNSCC undergoing curative surgery [18], radiation [1921], or combined chemoradiation treatments in various head and neck cancer sites [46, 22, 23]. Given the interest in PET-based imaging the MTV has been recently explored as a combined volumetric and metabolic radiological biomarker. Studies have reported the predictive power of MTV in patients with head and neck cancer undergoing chemoradiation [12, 13].

The relative importance between SUVmax, GTV, and MTV of the primary tumor in the risk stratification of HNSCC patients has not been determined. This retrospective study sought to compare the prognostic utility of SUVmax, GTV, and MTV with respect to disease outcome and survival in patients with HNSCC undergoing IMRT with or without concomitant chemotherapy.

Materials and methods

Patient selection

The study was conducted as a retrospective review approved by the institutional review board (IRB). Informed consent was waived by the IRB. Fifty-one newly diagnosed HNSCC patients treated between January 2007 and March 2011 underwent IMRT (with or without chemotherapy) and had PET/CT imaging obtained prior to start of IMRT. Forty-one patients met the inclusion criteria. In total 10 patients were excluded: 6 had synchronous or metachronous malignancies within 3 years prior to HNSCC diagnosis, 2 had unknown primary, 1 had sino-nasal cancer, and 1 died shortly after treatment from non-cancer related causes (sepsis). All patients were staged according to the 2002 AJCC classification [24].

PET/CT protocol

All PET/CT studies were performed on a GE Discovery STE 16 (General Electric, Milwaukee) PET/CT scanner. Patients were scanned skull base to mid-thigh in treatment position on a flat table. Patients were injected with an average of 13.6 ± 3.3 mCi of 18F-FDG and incubated for an average period of 63.0 ± 5.9 min. The amount of injected radioactivity was routinely measured by quantification of the radioactivity of the syringe before and after injection. All patients were scanned using a dedicated head and neck protocol. Head and neck images were acquired with the arms down and body images were obtained with arms up from clavicle to mid-thigh. Body images were obtained first, followed by head and neck images and then low-dose deep inspiration images of the chest.

The dedicated head and neck PET scans were done using 2D imaging with emission scans lasting between 5 and 6 min, and a field of view (FOV) of 30 cm. The matrix size was 128 × 128, and slice thickness was 3.3 mm. The CT images were obtained with a matrix of 512 × 512. Beam collimation was 10 mm with a pitch of 0.984. Table speed was 9.84 mm/rotation and the slice thickness was 0.625 mm. kV of 120 and mAs of 440 were used. Intravenous contrast (IV) was administered by power injection (GE electric, Milwaukee) of 60 ml of Optiray IV (Tyco Health care/Mallinckrodt, Hezelwood) after a 40 s delay for the head and neck images. There was a second bolus after a 110 ml of IV contrast was given for the body section of the study. CT images were reconstructed to the PET slice thickness to match the PET and to create fused images. In addition, CT images were reconstructed at 1.25 mm with 1.25 mm spacing in soft tissue and bone algorithm for review.

PET/CT image analysis

All PET/CT studies were electronically retrieved from archives and reviewed on a GE Advantage Workstation by a single, board-certified radiologist with neuroradiology and nuclear medicine fellowship training. PET, CT, and fused PET/CT images were displayed in axial, coronal, and sagittal planes. For the purposes of this study, the relevant imaging parameter measurements were the primary tumor SUVmax and MTV segmented from PET. MTV was defined as the tumor volume with FDG uptake segmented by a gradient-based method. The commercially available MIMvista software analysis suite (MIM Software Inc., Cleveland, OH) includes a contouring suite for radiation therapy planning and a PET/CT fusion suite. Once the primary tumor (target) was segmented, SUVmax and MTV were automatically calculated by the MIMvista software. The gradient and threshold segmentation methods of volume measurement available in MIMvista software previously described rely on an operator-defined starting point near the center of the lesion [25, 26]. As the operator drags the cursor out from the center of the lesion, six axes extend out, providing visual feedback for the starting point of gradient segmentation. Spatial gradients are calculated along each axis interactively, and the length of an axis is restricted when a large gradient is detected along that axis. The six axes define an ellipsoid that is then used as an initial bounding region for gradient detection. The MTV and SUVmax within the bounding region are automatically calculated.

IMRT treatment planning

Patients underwent CT simulation (Brilliance CT Big Bore, Philips Medical Systems, Cleveland, OH) in the supine position immobilized with a custom thermoplastic mask. The radiation planning CT acquisition encompassed the vertex of the scalp to at least 5 cm below the clavicle using 2–3 mm slice thickness. Treatment planning was performed using Philips Pinnacle3 software suite (version 6.0 to 8.0m, Philips Medical Systems, Fitchburg, WI). GTVs were contoured incorporating diagnostic CT, PET, and/or MR images. To aid GTV contouring, PET/CT images were fused using Philips Pinnacle3 software suite prior to 2008 or MIMVista version 5.1.2 (MIMVista Corp., Cleveland, OH) after 2008. Structures on the planning CT contoured by the physician included: GTV, clinical target volume (CTV), planning target volume (PTV), and organs at risk including critical normal tissue organs adjacent to the target volumes. GTVs were manually contoured for IMRT by a single board-certified radiation oncologist and the volumes were then calculated by the software when generating dose volume histograms. No auto segmentation was used to create GTVs. Volumetric expansions from GTV to CTV were 7–15 mm (respecting normal tissue planes) followed by a 3–5 mm expansion to PTV. IMRT plans were designed with seven to ten 6 mv photon beams, using an inverse optimization algorithm with normalization such that 95 % of PTV was covered with the prescription dose (66–70 Gy), with the goal of no more than 1 % of PTV receiving less than 93 % of prescription dose, and no more than 1 % or 1 cc of the tissue outside the PTV receiving more than 110 % of prescription dose. Elective nodal areas and regions at risk for subclinical disease were treated to 54–60 Gy using a dose painting technique.

Treatment

All patients were treated with definitive IMRT. The GTV was treated to a median dose of 69.96 Gy (range 66.0–69.96 Gy), over a median of 33 fractions (range 32–33), and a median of 48 days (range 39–72 days). Concurrent chemotherapy was given to 36 (87.8 %) patients: 23 received cisplatin, 8 received carboplatin, and 5 received cetuximab. Of these 36 patients 15 also received induction chemotherapy.

Follow-up

Patients were followed after the conclusion of treatment, continuing until analysis or patient death. PET/CT was used to assess clinical response in addition to clinical exam at 3 months as part of standard treatment care. Disease recurrence was defined as the first site of failure including local failure, nodal failure or distant failure. All failures were confirmed by biopsy.

Statistical analysis

The statistical endpoints analyzed in this study were local control (LC), nodal control (NC), distant control (DC), overall control (OC), overall survival (OS) and disease-free survival (DFS), measured from the end of IMRT to the date of event, censoring patients at last follow-up or death. For OC, the event is occurrence of first local, nodal, or distant relapse. Overall survival was defined as death due to any cause, DFS included patients who died or had disease relapse anytime after the end of IMRT.

The Kaplan–Meier product-limit method was used to estimate the probabilities of tumor control and survival rates at 2 years irrespective of follow-up length [27]. The comparison of survival rates among the groups was done using the two-tailed log rank test. A probability value of less than 0.05 was considered statistically significant. All other statistical computations were performed on SAS 9.1 system (SAS Institute, Cary, NC).

Results

Patients and tumor characteristics

Non-white patients comprised 61 % of patient cohort, with 71 % of patients presenting with stage  IV disease. The overall median follow-up was 24.2 months (range 2.7–56.3 months) and 27.1 months (range 4.0–56.3) among surviving patients. Complete patient and tumor characteristics are described in Table 1. For the patient cohort the median SUVmax of the primary tumor was 15.8 (range 4.5–33.8), the median GTV was 22.2 cc (range 1.5–162.5 cc), and the median MTV was 7.2 cc (range 0.40–43.5 cc). Overall and sub-site specific PET/CT and tumor volume characteristics are described in Table 2.

Table 1 Patient characteristics of 41 head and neck cancer patients
Table 2 PET/CT and tumor volume characteristics of 41 head and neck cancer patients

Disease control and patterns of failure

Local, nodal, and distant recurrences occurred in 10, 5, and 7 patients, respectively, with a median time to recurrence of 2.4 months. The median time to local, nodal, and distant failure was 2.9, 2.2, and 2.2 months, respectively. The estimated 2-year actuarial LC rate was 77.7 %, NC rate was 87.7 %, and the DC rate was 82.0 %. The estimated actuarial DFS and OS rates at 2 years were 67.6 and 79.8 %, respectively, Table 3.

Table 3 PET/CT parameters, tumor volume and survival and disease control status

SUV parameters, tumor volume and treatment outcome

Table 3 describes the LC, NC, DC, OC, OS, and DFS of the patient cohort dichotomized by median SUV parameters and tumor volume measurements. SUVmax (<15.8 vs. ≥15.8) showed a non-significant trend for DC (95.0 vs. 69.7 %, p = 0.053). A smaller GTV (median cutoff of <22.2 cc) was associated with improved disease control and survival for all treatment outcomes including LC (100.0 vs. 56.4 %, p = 0.001) and OS (94.4 vs. 65.9 %, p = 0.045), Figs. 1, 2. A similar result was noted for MTV where patients with smaller MTVs (median cutoff of <7.2 cc) had improved LC and OS as compared to patients with larger MTV; LC (100.0 vs. 54.2 %, p < 0.001) and OS (94.7 vs. 64.2 %, p = 0.040), Figs. 3, 4.

Fig. 1
figure 1

Local control according to median gross tumor volume (GTV)

Fig. 2
figure 2

Overall survival according to median gross tumor volume (GTV)

Fig. 3
figure 3

Local control according to median metabolic tumor volume (MTV)

Fig. 4
figure 4

Overall survival according to median metabolic tumor volume (MTV)

Correlating T category, AJCC stage with SUV parameters and tumor volume

There was a significant correlation between GTV and MTV with a Pearson correlation coefficient of 0.53 (p < 0.0004). Smaller GTV (<22.2 cc) was associated with lower MTV (4.0 vs. 18.1 cc, p < 0.001). A significant association was also found between tumor volume measurements and SUV parameters with larger tumor volume associated with greater SUVmax, Table 4.

Table 4 SUVmax, GTV, MTV by AJCC stage and tumor stage

AJCC stage correlated with SUVmax, but not GTV and MTV though number of patients with stage I, II and III disease were fewer. Compared to patients with AJCC stage I–III disease, stage IV disease patients had higher values of SUVmax (11.9 vs. 17.7, p = 0.013), larger GTV (19.8 vs. 38.6 cc, p = 0.109), and larger MTV (4.1 vs. 14.2 cc, p = 0.012). A non-significant trend was noted for SUVmax, GTV, and MTV with increasing T stage, Table 4.

Discussion

This study demonstrates that both GTV and MTV are superior prognostic radiological biomarkers of treatment outcome and survival for HNSCC patients undergoing definitive IMRT as compared to SUVmax. Improved local, nodal, and overall control rates were seen in patients with smaller GTV and MTV. SUVmax was found to correlate significantly with AJCC stage, GTV and MTV, although in this study it was not found to be prognostic for outcome.

Schwartz et al. [28] evaluated 54 patients with HNSCC, undergoing definitive RT including postoperative patients with or without concurrent chemotherapy, and reported that a SUV of greater than 9, the median, significantly correlated with inferior local control and disease-free survival. On univariate and multivariate analyses these data remained significant or borderline significant. Similarly, Machtay et al. [14] reported in a cohort of 60 HNSCC patients, treated with definitive radiotherapy with or without concurrent chemotherapy, that an SUVmax <9, median SUVmax of the study was 7.2, was associated with improved 2-year DFS of 72 versus 37 % (p = 0.007). Torizuka et al. [15] reported in 50 consecutive HNSCC patients who underwent definitive RT with or without chemotherapy, or surgery with or without postoperative RT that an SUVmax of ≤7 significantly predicted higher rates of 2 year local control and disease-free survival. When adjusted for age and nodal stage these findings remained significant. However, the median SUVmax for the cohort was 10.53, and they did not identify how an SUV max of 7 was selected as the optimal cut point. Limitations of comparing SUV as a radiological biomarker between studies includes the use of different SUV cutoff values which may be influenced by multiple factors including patient selection, differences in imaging technique, injected FDG dose, incubation period, protocol, scanner, and reconstruction algorithm variation [2931].

Our study confirms the findings of Strongin et al. [22] who reported a series of 78 patients with stage III–IV oropharyngeal, laryngeal or hypopharyngeal cancer, in which patients with locoregional failure had greater tumor volumes than patients free of disease (58 vs. 36.5 cc, p = 0.028), and those with a GTV <35 cc had significantly improved overall control (71 vs. 41 %), progression free survival (61 vs. 33 %), and overall survival (84 vs. 41 %) rates. Chen et al. [6] demonstrated that a primary tumor volume of greater than 60 cc in patients with nasopharyngeal cancer is superior to the AJCC and the TNM classification system when correlated with survival rates in patients with nasopharyngeal cancer who underwent definitive RT. Similarly, Studer et al. [4] reported that a GTV-based staging system was superior to TNM and AJCC systems in correlating overall survival in patients with any site, except laryngeal HNSCC undergoing definitive IMRT.

The current study demonstrated that the clinical GTV based on PET/CT and clinical examination, appears to be prognostic as it correlates with control and survival in HNSCC patients who were treated with definitive IMRT with and without induction and/or concurrent chemotherapy regimens. Given the expanding interest in metabolic and volumetric-based indices, we evaluated the prognostic utility of MTV, defined as the volume of tumor with FDG avidity. Chung et al. [12] reported on 64 patients with pharyngeal cancer undergoing definitive radiation therapy with or without concomitant chemotherapy. Patients with a MTV greater than 40 cc, a statistically optimized cut point, indicated a significantly worse disease-free survival than those with MTV ≤40 cc (HR 3.42, p = 0.04) using a raw SUV cutoff of 2.5 to define MTV within a radiologist contoured margin of the primary tumor and areas of nodal disease. La et al. [13] recently demonstrated the predictive value of MTV in patients with head and neck cancer undergoing chemoradiation. MTV was defined by autosegmentation in three dimension of volume with 50 % or greater SUVmax using custom software on pretreatment PET scans [13]. An increase in MTV of 17.4 cc or greater correlated with recurrence or death. MTVs correlated with the GTV with a correlation coefficient of 0.73, but consistently underestimated GTV, which was a finding confirmed in our study [13]. Similar to our findings, these studies failed to demonstrate a correlation with SUVmax and DFS or OS.

Given the ability of autosegmentation algorithms, MTV has potential to become a standardized prognostic metric. We suspect that interest in the standardization of MTV will continue to grow as new algorithms are developed, but it is critical to understand the current limitations of MTV as a metric including the lack of a standardized SUV threshold, lack of true correlation with anatomic structures, validation of autosegmentation software, and variability in SUV cutoffs. The impact of these limitations were clearly demonstrated by Ford et al. [32] who reported that a 5 % change in threshold contour can translate into a 200 % increase in contour volume resulting in a significant dosimetric effect. Furthermore, MTV has significant limitation in defining target volumes for treatment planning and using the MTV for treatment planning purposes alone could risk marginal treatment failure as it may underestimate the tumor volume. In contrast, GTV integrates multiple information including radiological and clinical examination findings.

In summary, GTV and MTV demonstrate superior prognostic utility as compared to SUVmax as patients with larger tumor volumes are associated with significantly inferior control and survival in HNSCC patients treated with definitive IMRT.