Abstract
Only a minority of esophageal cancers demonstrates a pathologic tumor response (pTR) to neoadjuvant chemotherapy (NAC). 18F-FDG PET/CT is often used for restaging after NAC and to assess response. Increasingly, it is used during therapy to identify unresponsive tumors and predict pTR, using avidity of the primary tumor alone. However, definitions of such metabolic tumor response (mTR) vary. We aimed to comprehensively reevaluate metabolic response assessment using accepted parameters, as well as novel concepts of metabolic nodal stage (mN) and metabolic nodal response (mNR). Methods: This was a single-center retrospective U.K. cohort study. All patients with esophageal cancer staged before NAC with PET/CT and after with CT or PET/CT and undergoing resection from 2006 to 2014 were identified. pTR was defined as Mandard tumor regression grade 1–3; imaging parameters included metrics of tumor avidity (SUVmax/mean/peak), composites of avidity and volume (including metabolic tumor volume), nodal SUVmax, and our new concepts of mN stage and mNR. Results: Eighty-two (27.2%) of 301 patients demonstrated pTR. No pre-NAC PET parameters predicted pTR. In 220 patients restaged by PET/CT, the optimal tumor ΔSUVmax threshold was a 77.8% reduction. This was as sensitive as the current PERCIST 30% reduction, but more specific with a higher negative predictive value (P < 0.001). ΔSUVmax and Δlength independently predicted pTR, and composite avidity/spatial metrics outperformed avidity alone. Although both mTR and mNR were associated with pTR, in 82 patients with 18F-FDG–avid nodes before NAC we observed mNR in 10 (12.2%) not demonstrating mTR. Conclusion: Current definitions of metabolic response are suboptimal and too simplistic. Composite avidity/volume measures improve prediction. mNR may further improve response assessment, by specifically assessing metastatic tumor subpopulations, likely responsible for disease relapse, and should be urgently assessed when considering aborting therapy on the basis of mTR alone.
In the United States and Europe, the mainstay of curative treatment of esophageal cancer is neoadjuvant chemotherapy (NAC) or chemoradiotherapy (NACR) followed by surgery (1,2). Both confer important survival benefits (3); however, up to 60% of tumors show either minimal or no pathologic response (pTR) to NAC (4,5), and a similarly poor response is seen in 30%–40% after NACR (6). For these patients such, in retrospect, futile therapy delays surgery, potentially allowing disease progression and a worse prognosis (7). The ability to predict pTR at the outset would therefore be invaluable, as it would allow personalized therapy, with neoadjuvant therapy being omitted or changed to alternative therapy in those patients unlikely to benefit.
The evidence for the predictive value of baseline molecular markers and PET is insufficiently robust to justify major treatment changes (8,9). Interval assessment of response during therapy is, therefore, the next best option for personalizing therapy. Interval tumor metabolic response (mTR) on PET predicts pTR, albeit imperfectly. A 35% reduction in SUVmax is most commonly used during therapy (10,11) and formed the basis of the landmark MUNICON trial, wherein NAC was continued after a single cycle only in patients with a reduction in SUVmax greater than 35% (12); the alternative PERCIST recommend a 30% reduction after NAC to define mTR (13). However, these thresholds may not be optimal: PERCIST is neither tumor- nor context-specific, whereas the MUNICON threshold was derived from just 40 patients; furthermore, SUVmax provides no spatial information. More fundamentally, both assess only the primary tumor; the high rates of disease recurrence seen even in patients with pathologically responsive primary tumors suggests important unidentified factors, perhaps involving nodal or distant micrometastases—a recent report described tumor downstaging after NAC (a reduction from pretreatment clinical to posttreatment pathologic stage) to be strongly associated with survival (14).
With this in mind, we recently explored the novel concepts of 18F-FDG–avid nodal stage (mN stage) and metabolic nodal response (mNR) and demonstrated major clinical implications for identifying disease progression during NAC, independent of primary tumor stage and response (15).
In this study, we aimed to reexamine comprehensively the utility of PET/CT in predicting pTR to NAC. First, we assessed the predictive ability of clinical, pathologic, and imaging factors available before NAC. Second, we aimed to define and compare optimal thresholds of mTR after NAC and assess, for the first time, to our knowledge, the novel concept of mNR. Third, we aimed to generate and validate predictive models that might have clinical utility.
MATERIALS AND METHODS
Patients and Staging Protocol
All patients who underwent potentially curative surgical resection of esophageal/gastroesophageal junctional cancer and were staged initially with CT and 18F-FDG PET/CT were identified from a departmental database (May 2006 to November 2014) (16). This included all cell types. The study was approved by the institutional clinical governance department, and the need for written informed consent was waived. Patients were also staged with endoscopic ultrasound and laparoscopy for tumors extending below the diaphragm as previously described (16). Examinations were reported by a consultant upper gastrointestinal radiologist/gastroenterologist using the contemporary American Joint Committee on Cancer TNM staging manual (sixth (17) or seventh edition (18)).
NAC
NAC was considered for all patients with disease more advanced than T1N0. Patients with esophageal and gastroesophageal junction (GEJ) Siewert 1/2 tumors (19) received either cisplatin and 5-fluorouracil (5-FU; 2 cycles; n = 182) (20); oxaliplatin and 5-FU (2 cycles; n = 46) (21); epirubicin, cisplatin, and 5-FU (ECF; 3 cycles; n = 7); epirubicin, cisplatin, and capecitabine (ECX; 3 or 4 cycles; n = 22) (22); epirubicin, oxaliplatin, and capecitabine (EOX; 3 cycles; n = 3); cisplatin and etoposide (2 cycles; n = 1); or oxaliplatin and capecitabine (2 cycles; n = 1). Patients with type 3 GEJ tumors received ECX/EOX/ECF (3 cycles). Some patients (distal esophageal/GEJ) received 3 cycles of ECX pre- and postoperatively with (n = 7) or without bevacizumab (n = 20) (23), or 3 cycles of ECF pre- and postoperatively (n = 12) (24).
Restaging CT and PET/CT
Patients were restaged 4–6 wk after NAC using CT before 2008 and PET/CT afterward (although a small minority underwent CT because of clinical trial protocols) as previously described (16). 18F-FDG PET/CT was performed using 1 of 2 scanners. Before November 3, 2009, scans were obtained on a Discovery STE (GE Healthcare) 60 min after injection of 400 MBq of 18F-FDG. Images were reconstructed using a time-of-flight ordered-subset expectation maximization reconstruction algorithm (2 iterations, 20 subsets; field of view, 50 cm; matrix, 128; voxel size, 3.9 × 3.9 × 3.3 mm; 2D). After November 3, 2009, scans were obtained on a Discovery 690 (GE Healthcare) 90 min after injection of 18F-FDG (4 MBq/kg). Images were reconstructed using a time-of-flight ordered-subset expectation maximization reconstruction algorithm (2 iterations, 24 subsets; gaussian filter, 6.4 mm; field of view, 50 cm; matrix, 256; voxel size, 1.95 × 1.95 × 3.3 mm; 3D). Examinations were independently reported by 2 dedicated PET/CT radiologists.
Operations
Surgery was typically performed within 2 wk of the restaging scan. A minimum 2-field lymphadenectomy was performed as standard.
Data and Variables
Patient variables included age, sex, and American Society of Anesthesiologists grade (25); pretreatment tumor variables were cell type, grade (26), anatomic site, T (seventh edition), N stage (sixth edition because data were insufficient for conversion to the seventh), and whether the tumor was impassable at esophagogastroduodenoscopy. PET/CT variables are described below. NAC variables comprised a dual- or triple-agent regimen (due to large number of regimens and small patient groups) and time (d) from staging to restaging scan and scan to surgery to adjust for delays and number of cycles given. pTR was defined as Mandard Tumor Regression Grade (TRG) of 3 or less, after dedicated review by a consultant cellular pathologist (27). The Mandard TRG was used in preference to alternative TRGs, being the most frequently used TRG for esophageal cancer (28), with optimal prediction of survival (29,30).
PET/CT Variables
Variables comprised primary tumor 18F-FDG avidity (SUVmax and length [cm]), mN stage, mNR, and SUVmax of the most 18F-FDG–avid node. The development of mN stage and mNR have been described previously (15). mN stage (nodes visible discretely from the tumor, within a standard lymphadenectomy territory, with SUVmax > 2.5 or background mediastinal blood pool) comprised mN0 (0 avid nodes), mN1 (1–2 nodes), and mN2 (>2 nodes). mNR comprised complete metabolic response or partial metabolic response (reduction in mN or SUVmax ≥ 30%), stable metabolic disease (stable mN or reduction/progression SUVmax < 30%), or progressive metabolic disease (progression of mN or SUVmax ≥ 30%).
For examinations using the second PET/CT scanner, additional variables were generated by 2 authors: metabolic tumor volume (MTV), SUVmean, SUVpeak, and tumor glycolytic volume (TGV) mean and max (TGVmax/mean). MTV was measured using a fixed-threshold technique (SUV ≥ 4). TGVmean was calculated manually as the product of MTV and SUVmean. TGVmax was calculated as the product of MTV and SUVmax. mTR was quantified using absolute changes (∆%) and thresholds defined previously (PERCIST and MUNICON criteria; SUVmax) (13); additionally, new thresholds were generated by receiver-operator characteristics.
Statistical Analysis
Analysis was performed using R software (version 3.0.2) (31). Correction for multiple comparisons was performed using the Bonferroni method (32) or false-discovery rate using FDRtoolv1.2.12 (33). For regression, continuous variable distribution was assessed using density plots and transformed (age2; logSUVmax/mean/peak and time to restaging/surgery). Multivariate analysis included all variables (including PET/CT scanner) after exclusion of perfect separators. Receiver-operator-characteristic optimal thresholds were calculated and compared with pROC (34) (95% confidence intervals using 200 iterations of 0.632 bootstrapping). Sensitivities and specificities were compared using the McNemar test (DTComPair version 1.0.3) (35).
Model Development, Tuning, Validation, and Performance
Three techniques were used as previously described (16): logistic regression (backward stepwise binary logistic), decision-tree analysis (recursive partitioning using loss matrices), and artificial neural networks (feed forward back-propagation multilayer perceptron). Models were tuned, generated, and validated internally (0.632 bootstrapping) using a development group (patients staged/restaged using the more recent scanner) and validated independently (patients staged/restaged using the earlier scanner; validation group). We partitioned patients in this way to minimize any potential bias, to ascertain immediate clinical utility, and also to assess generalizability to a different scanner system.
RESULTS
Three hundred two patients underwent resection after NAC. TRG was available for 301 (Tables 1–3). pTR was evident in 82 patients (27.2%): TRG 1 in 14 (4.65%), TRG 2 in 13 (4.32%), TRG 3 in 55 (18.3%), TRG 4 in 152 (50.5%), and TRG 5 in 67 (22.3%).
Predicting Pathologic Response Before NAC
Although there were nominally significant associations between tumor anatomic location and response, on multivariate regression, the only variable that predicted pTR was the use of a triple-agent NAC regimen: odds ratio (OR), 5.98 (95% confidence interval, 2.44–14.7; P = 8.94 × 10−5; Table 4).
Predicting Pathologic Response After NAC Using Absolute PET Variables
A more 18F-FDG–avid primary tumor after NAC, as quantified by all metrics, was negatively associated with pTR: logSUVmax OR, 3.84 × 10−4 (1.17 × 10−5 to 2.00 × 10−3; P = 9.89 × 10−6; Table 5; Supplemental Table 1 [supplemental materials are available at http://jnm.snmjournals.org]).
Predicting Pathologic Response Using Metabolic Tumor Response
mTR predicted pTR (Tables 3 and 6; Supplemental Table 2). This was true both for ΔSUVmax and for Δlength, independently on regression: logΔSUVmax OR for each percentage reduction, 1.03 (1.01–1.06), P = 3.24 × 10−3; Δlength OR, 1.02 (1.00–1.03); P = 0.019. Interestingly, whereas a PERCIST ≥ 30% reduction was associated with pTR, the MUNICON ≥ 35% threshold was not, once adjusted for Δlength. All additional metrics of mTR were associated with pTR.
Predicting Pathologic Response Using mNR
mNR was associated with pTR using the Fisher exact test (Table 3) but not on multivariate regression (Table 6). Notably, mNR and pTR were discordant in 42 of 220 (19.1%) patients (Table 7). In 41 cases, there was a nodal complete metabolic response or partial metabolic response without pTR, representing 51.2% of the 82 patients with 18F-FDG–avid nodes before NAC (Table 7).
mTR and mNR were also compared (Table 7) and were found to be discordant in 13 (5.90%) cases overall, representing 15.9% of patients with 18F-FDG–avid nodes before NAC. Typically, discordance arose due to a mNR in the absence of mTR (10 cases; 4.6% and 12.2%, respectively).
Defining Optimal Metabolic Response Thresholds
The accuracy of each continuous (nonthreshold) metric of mTR in predicting pTR is shown in Supplemental Table 3: all were moderately discriminant (80.2%–84.4%), with no statistically significant differences.
The optimal thresholds for each metric of mTR were determined (Supplemental Table 3), for discrimination, sensitivity, and specificity. The optimal Δ SUVmax for sensitivity was a 27.4%–30.6% reduction, identical to PERCIST (30%) and similar to the MUNICON threshold (35%). However, specificity was minimal: 33.0% (23.8–42.6) and 41.8% (32.0–52.2), respectively. By contrast, the optimal ΔSUVmax threshold for balancing sensitivity (73.6% [58.6–82.7]) and specificity (84.5% [78.7–89.1]) was dramatically different: a 77.8% reduction. Rounded down to a more pragmatic 75.0%, sensitivity was identical, whereas specificity reduced slightly to 84.0%.
The ability of each mTR metric to predict pTR is shown in Supplemental Tables 4–6. Overall, a ΔSUVmax of 77.8% was significantly more discriminant, with a higher negative predictive value than the PERCIST (30%) and MUNICON (35%) thresholds. The same was true for ΔMTV, ΔTGVmax, and ΔTGVmean. The highest sensitivities were seen with PERCIST (sensitivity, 100%), MUNICON (97.1%), ΔMTV (97.1%), ΔTGVmax (97.1%), and ΔTGVmean (94.3%); these were significantly more sensitive than Δlength (<4.68 × 10−3; false-discovery rate = 0.046) but not a ΔSUVmax of 77.8%. The most specific were a ΔSUVmax of 77.8% (81.7% specific) and Δlength of 53.1% (82.7%) (P < 4.11 × 10−4).
Performance of Predictive Models
Models were generated (Supplemental Table 7) using metrics of mTR/mNR. The most successful was a logistic regression model comprising Δlength + ΔSUVmax; this was highly sensitive (91.4%), moderately specific (71.4%) and discriminant (0.814), and this sensitivity persisted during internal and independent validation (although with relatively poor specificity and discrimination). However, ultimately none of the composite models outperformed individual mTR thresholds (Supplemental Tables 4–7).
DISCUSSION
In this study of 301 patients treated with NAC and surgery—the largest to date in esophageal cancer—we found no baseline clinical, tumor, or PET variables associated with pTR. This is perhaps unsurprising, reflecting the daunting complexity involved. Chemoresistance is usually multifactorial and constitutes a spectrum of sensitivity, which depends on numerous macroscopic, microscopic, and molecular factors modulating chemotoxicity (36,37). Intratumoral heterogeneity further complicates chemoresistance, with several subclones, potentially demonstrating differential response and baseline characteristics, in addition to heterogeneity between tumor and nodal metastases. In contrast, after NAC, several PET variables, including absolute tumor metrics and those assessing either mTR or mNR, were strongly associated with pTR on multivariate analysis, and several clinically relevant implications were identified.
First, the identification of a significantly better ΔSUVmax threshold (77.8% reduction) than the generic PERCIST threshold (30%) suggests that the latter should be raised considerably for esophageal cancer to improve stratification of mTR (perhaps to a more pragmatic 75%). This threshold was nominally significantly better than the MUNICON threshold (35%), but as this threshold was originally derived during therapy rather than after therapy (as in our study), the significance of this is uncertain and we are unable to draw further conclusions.
Second, rather than considering avidity in isolation, we found evidence that incorporating spatial data improved prediction: Δlength at a most basic level, or ideally a composite metric such as ΔMTV or ΔTGVmax/mean. These outperformed the existing recommended PERCIST threshold of a 30% SUVmax reduction. They were comparably sensitive, but more specific (P < 4.11 × 10−4) and discriminant (P < 9.38 × 10−5), and were supported by internal (bootstrapping) validation. This suggests that composite metrics may have greater predictive ability in clinical trials than ΔSUVmax alone (such as in the MUNICON trial 35% threshold). In particular, their superior specificity and high negative predictive value (98.5%–100%) might identify more nonresponders suitable for cessation of therapy. These findings are in keeping with those of recent smaller studies in chemoradiotherapy; in 20 patients using support vectors and logistic regression, Zhang et al. found mTR quantified using spatial avidity metrics outperformed avidity alone in predicting pTR (38); whereas in 37 patients Jayachandran et al. found MTV to outperform SUVmax (39).
Third, to our knowledge, this is the first study to assess the novel concept of mNR in association with pTR. We found that the primary tumor and nodal disease often demonstrated a discordant response to NAC, with mNR seen in the absence of mTR or pTR. With the use of mTR alone (as in the MUNICON trial), this subgroup of patients would be classed as nonresponders and NAC aborted; our findings suggest that in such patients their nodal metastases may in fact be responding to treatment. Nodal metastases by definition contain an aggressive subpopulation of cancer clones originating from the primary tumor, which then evolve differently at a genetic and phenotypic level (40). A crucial such phenotype is chemosensitivity. Although clearly mNR is likely an imperfect surrogate of pathologic nodal response, no systems for assessing nodal response are in use. Our findings are important, as they offer a vital insight into assessing response in the tumor subclones with proven metastatic behavior, likely to be responsible for local and distant disease relapse.
This study has several limitations. Although the current gold standard technique for disease response assessment is direct histopathologic examination, this remains imperfect. We used the Mandard classification, which originally described the response of esophageal squamous cell carcinoma to cisplatin-based NACR (27). The Mandard TRG has subsequently been validated for esophageal adenocarcinoma (41) although several other classifications have been described (42); all, however, remain relatively subjective and are tempered by potential interobserver variability and intratumoral sampling bias (43). Ultimately, the Mandard TRG is most frequently used and provides the basis for optimal prediction of survival (28,30). An additional limitation of this study is its retrospective design over a long time period, which although necessary to generate a sufficient cohort resulted in a change of PET/CT scanner, and the availability of additional metrics for the more recent scanner alone. In addition, we included a range of cell types, rather than restricting our analysis. We sought to mitigate these limitations with dedicated review of TRG by a single expert pathologist, by adjusting analyses for cell type and the scanner used and by restricting model development to the more recent representative scanner with subsequent validation in the earlier group, to minimize any bias. We also performed a post hoc analysis comparing metrics between scanners, demonstrating no significant differences in either metabolic response of the primary or nodal tumor (P = 0.109 [Mann–Whitney] and 0.068 [Fisher exact test]). We believe this to be the largest study performed for esophageal cancer and believe that our results are robust—whether they can be extrapolated to NACR is not clear, but we believe these results warrant urgent assessment. In addition, assessment of several textural response parameters, including entropy and run-length matrices, which although not routinely used in clinical practice have recently been shown to be associated with pTR after NACR (44), and their inclusion in conjunction with volume have been suggested to improve prognostication (45). Such metrics may therefore provide complementary predictive data.
CONCLUSION
We found that the current definitions used for metabolic response assessment after NAC, based solely on ΔSUVmax, are both suboptimal and too simplistic and that using composite measures of 18F-FDG avidity and volume could significantly improve the predictive ability of PET. The assessment of nodal response, which is often discordant with the primary tumor response, should be urgently studied, as it may offer the potential to further improve response assessment, specifically within tumor populations with proven metastatic behavior.
DISCLOSURE
John M. Findlay is supported by the NIHR Oxford Biomedical Research Centre. Fergus V. Gleeson is a paid consultant to Alliance Medical. Mark R. Middleton is a paid consultant/advisor to Amgen, BMS, GSK, Merck, and Millennium and has received institutional funding from Amgen, AZ, BMS, Clovis, Eisai, GSK, Immunocore, Johnson & Johnson, Merck, Millennium, Novartis, Pfizer, Roche, and Vertex. No other potential conflict of interest relevant to this article was reported.
Footnotes
Published online Sep. 15, 2016.
- © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication March 29, 2016.
- Accepted for publication July 13, 2016.