Abstract
The optimal methodology for defining response with 18F-FDG PET after curative-intent chemoradiation for non–small cell lung cancer (NSCLC) is unknown. We compared survival outcomes according to the criteria of the European Organization for Research and Treatment of Cancer (EORTC), PERCIST 1.0, the Peter Mac metabolic visual criteria, and the Deauville criteria, respectively. Methods: Three prospective trials of chemoradiation for NSCLC, involving baseline and posttreatment 18F-FDG PET/CT imaging, were conducted between 2004 and 2016. Responses were categorized as complete metabolic response (CMR), partial metabolic response, stable metabolic disease, or progressive metabolic disease. Cox proportional-hazards models and log-rank tests assessed the impact of each response on overall survival (OS). Results: Eighty-seven patients underwent 18F-FDG PET/CT before and after radical chemoradiation for NSCLC. Follow-up 18F-FDG PET/CT scans were performed at a median of 89 d (interquartile range, 79–93 d) after radiotherapy. Median follow-up and OS after PET response imaging were 49 and 28 mo, respectively. Interobserver agreements for EORTC, PERCIST, Peter Mac, and Deauville had κ values of 0.76, 0.76, 0.87, and 0.84, respectively. All 4 response criteria were significantly associated with OS. Peter Mac and Deauville showed better fit than EORTC and PERCIST and distinguished better between CMR and non-CMR. Conclusion: All 4 response criteria were highly predictive of OS, but visual criteria showed greater interobserver agreement and stronger discrimination between CMR and non-CMR, highlighting the importance of visual assessment to recognize radiation pneumonitis, changes in lung configuration, and patterns of response.
Imaging with 18F-FDG PET plays an integral role in multidisciplinary treatment decision making, at diagnosis, restaging, or therapeutic response assessment, in patients with non–small cell lung cancer (NSCLC). Reliable early response assessment after curative-intent therapy would not only identify patients with persistent or progressive disease who require further treatment but also could avoid overtreatment of patients who may already be cured. Structural imaging, including CT or MRI, has significant limitations after treatment with curative-intent radiotherapy or chemoradiation, especially in lung malignancies. Tumors may be obscured by atelectasis, pneumonitis, or fibrosis. In addition, tumor masses often regress gradually over several months, and residual opacity or scarring may persist, mandating serial measurements to assess response. 18F-FDG PET/CT can overcome many of the limitations of structural imaging in response assessment, but the optimum method for characterizing response to predict overall survival (OS) is not established.
Although survival remains poor in locally advanced NSCLC treated with chemoradiation (1), the addition of adjuvant immunotherapy in stage III NSCLC has lately demonstrated a significant increase in progression-free survival (2). For decision making about treatment intensification with immunotherapy, targeted therapies, localized radiotherapy dose escalation, or even salvage surgery (3), a personalized risk-adapted approach, based on accurate early assessment of response after chemoradiation, would be logical.
The necessity for standardization of 18F-FDG PET reporting is well recognized, and different groups have developed criteria to categorize 18F-FDG PET response assessment, with the objective of increasing both intra- and interobserver reproducibility. Two semiquantitative response criteria have been proposed, the European Organization for Research and Treatment of Cancer (EORTC) criteria (4) and PERCIST 1.0 (5,6). The first visual response criteria (Peter Mac) for categorizing PET responses after radiotherapy were reported in 2003 (7). Deauville visual response criteria (8–10) were developed specifically for lymphoma and to our knowledge have never been studied in NSCLC. The semiquantitative approaches (EORTC, PERCIST) are often assumed to be more accurate and reproducible than visual qualitative interpretations (Peter Mac, Deauville). However, semiquantitative methods can be labor-intensive for daily clinical practice and have not been systematically compared with rigorous visual criteria.
In this study, we compared the EORTC, PERCIST, Peter Mac, and Deauville criteria for NSCLC response assessment after radical radiotherapy or chemoradiation to discover which method is best able to predict OS.
MATERIALS AND METHODS
Patients
Between 2004 and 2016, 3 prospective trials at our institution enrolled NSCLC patients treated with definitive radiotherapy or chemoradiation. Seventy-six patients were enrolled in a PET-planning study (11) (Peter Mac protocol 03/55), 60 patients in the 18F-3′-deoxy-3′-18F-fluorothymidine (FLT)/18F-FDG study (12) (Australian Clinical Trials Registry no. ACTRN12611001283965), and 60 patients in the 68Ga-ventilation/perfusion PET study (13) (universal trial number U1111-1138-4421). All patients were at least 18 y old, had an Eastern Cooperative Oncology Group performance status of 0–2, and had histologically or cytologically confirmed NSCLC. All cases were reviewed by a multidisciplinary lung tumor board. Follow-up schedules were uniform in these patient cohorts, with reviews every 3 mo for 2 y and every 6 mo until 5 y. Patients eligible for this analysis had undergone pre- and posttreatment 18F-FDG PET/CT imaging, the latter acquired between 1.5 and 4 mo after radiotherapy. Patients presenting with metastatic disease, local recurrence, or complete surgical resection prior to definitive chest radiotherapy or chemoradiation were ineligible. The institutional review board and the Peter MacCallum Cancer Clinical Research and Ethics Committee approved this retrospective study and waived the requirement to obtain informed consent. All patients had previously provided written informed consent to undergo their respective prospective studies.
Treatment Policy
All patients were treated to 50–60 Gy using 3-dimensional conformal or intensity-modulated radiotherapy planning according to institutional guidelines. Patients underwent staging, simulation, and treatment with arms raised and breathing freely. Tumor motion was accounted for using the 18F-FDG PET/CT planning scan (11) or with a 4-dimensional planning CT scan. Concomitant chemotherapy was administered, either cisplatin/etoposide or carboplatin/paclitaxel.
PET Scanning Acquisition and Processing
All 18F-FDG PET/CT scans were acquired on an integrated PET/CT scanner including a Discovery LS (GE Healthcare), an STE (GE Healthcare), or a Biograph 16 (Siemens Medical Solutions). Each baseline scan and posttreatment 18F-FDG PET/CT scan were performed on the same scanner with a uniform protocol. Patients fasted for at least 6 h and underwent blood glucose measurement before administration of 4.2 MBq of 18F-FDG per kilogram of body weight. The emission scan commenced 60–70 min later.
Assessment of Treatment Response
All semiquantitative analyses and qualitative assessments were performed using MIM software (version 5.4.4; MIM Software). For each patient, the 4 18F-FDG PET/CT response criteria were reported retrospectively without knowledge of the outcome. Two readers assessed Peter Mac and Deauville criteria, and 2 assessed EORTC criteria and PERCIST. Readers were paired into groups based on number of years of experience with 18F-FDG PET/CT response assessment (>10 y or <5 y). Responses to therapy were categorized as complete metabolic response (CMR), partial metabolic response, stable metabolic disease, or progressive metabolic disease as described in Supplemental Table 1 (supplemental materials are available at http://jnm.snmjournals.org) and as illustrated in Figure 1.
Interobserver agreement between the 2 respective observers was calculated for each criterion. For the EORTC criteria and PERCIST, all cases were read independently. For the Peter Mac and Deauville criteria, 2 readers assessed together the first 19 cases to discuss the interpretation of the Deauville criteria because they usually are not used for cancers other than lymphoma. The remaining cases were assessed independently. For the criteria comparison, all discrepant cases were discussed between the 2 respective observers and a consensus classification was reached.
Statistical Methods
OS was measured from the date of posttreatment 18F-FDG PET/CT to the date of death. Patients alive at the last contact had their survival censored at that date. Kaplan–Meier methods were used to describe the survival curves for each of the 4 response criteria and also grouped as CMR versus non-CMR. Cox proportional-hazards models were used to estimate the hazard ratios; calculate the c-statistic, Akaike information criterion, and r2; and assess the impact of each criteria on OS. Univariable and multivariable results adjusting for sex, histology, age, performance status, stage, weight loss, and treatment were provided. Cohen κ was used to assess the paired concordance and the interobserver agreement of each criteria. The Fisher exact test was used to compare the CMR rate of responses assessed before or 90 d after the last day of radiotherapy. All statistical analyses were performed in R, version 3.2.3 (R Foundation for Statistical Computing).
RESULTS
Patients
There were 87 patients eligible for analysis: 8 from the PET-planning study, 47 from the 18F-FLT/18F-FDG study, and 32 from the 68Ga-ventilation/perfusion PET study. Baseline and treatment characteristics are described in Table 1. Supplemental Table 2 presents the patients’ baseline glucose levels for the pre- and post-18F-FDG PET/CT. In the PET-planning study, response assessment with 18F-FDG PET/CT was optional and only 21 patients were potentially eligible for analysis. However, only 8 patients’ images could be retrieved from PACS archives. In the 18F-FLT/18F-FDG and GALLIPET-VQRT studies, the posttreatment 18F-FDG PET/CT was mandatory. For these studies, patients were ineligible if they lacked posttreatment 18F-FDG PET/CT (n = 30), had M1 disease at baseline (n = 4), had undergone radiation for recurrent disease (n = 5), or had small-cell-lung cancer (n = 1) and adjuvant radiotherapy for microscopic disease (n = 1). All patients completed their planned irradiation.
Follow-up 18F-FDG PET/CT scans were performed at a median of 89 d (interquartile range, 79–93 d; full range, 47–123 d) after radiotherapy. The CMR rate using the EORTC criteria did not differ (P = 0.64) from patients who had their posttreatment 18F-FDG PET/CT within 47–89 d (CMR = 25%) and within 90–132 d (CMR = 30%). There was also no difference when response was assessed using the other 3 criteria (results not shown).
Interobserver Agreement
The EORTC and PERCIST interobserver agreement was assessed for all 87 patients. The weighted κ for EORTC, PERCIST, Peter Mac, and Deauville was, respectively, 0.76 (95% confidence interval [CI], 0.63–0.89), 0.76 (95% CI, 0.62–0.89), 0.87 (95% CI, 0.75–0.99), and 0.84 (95% CI, 0.70–0.91). All subsequent analyses were based on the consensus decisions.
Intercriteria Agreement
All scans were evaluated except in one case in which PERCIST response was not assessable because of missing patient data. For EORTC, PERCIST, Peter Mac, and Deauville, CMR, partial metabolic response, and stable metabolic disease were reported in, respectively, 24, 37, and 9 patients; 24, 38, and 7 patients; 30, 39, and 1 patients; and 31, 38, and 1 patients. In all criteria, progressive metabolic disease was reported in 17 patients with new lesions outside the treatment fields.
Both semiquantitative criteria (EORTC and PERCIST) and both qualitative response criteria (Peter Mac and Deauville) showed almost perfect agreement with each other, with κ values of, respectively, 0.95 (95% CI, 0.89–1.00) and 0.98 (95% CI, 0.95–1.00). Agreement between the semiquantitative and the qualitative criteria was lower (Table 2). When EORTC and PERCIST were discordant with Deauville and Peter Mac (Fig. 2), the former two underestimated the visual criteria response in all but 1 case.
OS by Response
The estimated median follow-up was 49 mo, and the median survival, calculated from the date of the follow-up 18F-FDG PET/CT, was 28 mo. All 4 response criteria were associated with OS (Table 3). Overall, the Peter Mac and the Deauville criteria showed stronger associations than EORTC and PERCIST (higher c-statistic, higher r2, and lower Akaike information criterion; Supplemental Table 3). OS for each response criterion is shown in Figure 3.
The predicted 2-y OS for CMR versus non-CMR was, respectively, 76% (95% CI, 60%–97%) versus 51% (95% CI, 39%–66%) for EORTC, 76% (95% CI, 60%–97%) versus 50% (95% CI, 38%–65%) for PERCIST, 85% (95% CI, 73%–100%) versus 44% (95% CI, 32%–60%) for Peter Mac criteria and 82% (95% CI, 69%–98%) versus 45% (95% CI, 33%–61%) for Deauville criteria.
DISCUSSION
The intention of definitive radiotherapy and chemoradiation in unresectable NSCLC is permanent eradication of disease. Ideally, therapeutic response assessment should reproducibly identify a group of patients with an excellent prognosis, many of whom are likely to be cured, and to identify those with partial responses or progressive disease who may benefit from early intervention with additional therapies. In this setting, the reliability of the assessment of CMR is critical. Although CMR patients may still harbor subclinical disease and may benefit from additional therapy, it will be important to avoid overtreatment in a group in which many are already cured.
To harmonize 18F-FDG PET/CT reporting and determine which 18F-FDG PET/CT response criteria are most predictive for NSCLC, 2 semiquantitative (EORTC and PERCIST) and 2 qualitative (Peter Mac and Deauville) response criteria were compared. Reassuringly, all 4 response criteria showed highly significant associations with OS. These findings are consistent with our previous 2003 report, that qualitative interpretation of metabolic response at a single time-point early after curative-intent radiotherapy or chemoradiation provides powerful predictive information, stratifying patients into groups with widely differing survival probabilities (7,14). An early posttreatment 18F-FDG PET scan was more powerfully predictive of OS than CT imaging, stage, or performance status.
Prognostic stratification using 18F-FDG PET/CT response has been explored in different settings in NSCLC. Two pathologic validation studies performed on patients treated with trimodality approaches demonstrated the association between maximum SUV and pathologic response after neoadjuvant treatment (3,15). After single-modality palliative chemotherapy, the 2 semiquantitative PET response criteria (EORTC and PERCIST) were shown to be more sensitive and accurate than RECIST 1.1 for the detection of an early therapeutic response (16,17). Fledelius et al. applied PERCIST and Peter Mac after induction chemotherapy and before radiotherapy in 21 NSCLC patients. Both criteria showed comparable results, with a strong association with OS (18). The same group evaluated different visual and semiquantitative methods as well as different response category cutoffs to evaluate the early treatment response of NSCLC after 7–10 d of erlotinib (19). In their analysis of 29 different methods and parameters, total lesion glycolysis used in combination with PERCIST and Peter Mac visual criteria were the best methods for predicting treatment responses.
Metabolic response assessment with 18F-FDG PET can yield false-positive results related to active inflammation, especially in the early postradiotherapy phase (20,21). Serial imaging indicates that that inflammatory 18F-FDG uptake in normal tissues increases in the first few months after treatment rather than occurring early during radiotherapy (22).
In our study, the 2 visual criteria showed stronger associations with OS than EORTC and PERCIST. Although many aspects of PERCIST have been improved compared with the EORTC criteria, rigid interpretation of the SUV normalized to lean body mass (SUL) cannot account for postradiotherapy inflammatory changes, and alteration in the size of the region of interest can compromise assessment. Accurate measurement of SUL values, accurate assignment of region of interest, and accurate registration of the images from a series of examinations, particularly in the presence of 18F-FDG–avid pneumonitis and an evolving change in the lung configuration, can be challenging. In addition, the respiratory motion in the lung may superimpose inflammatory activity onto the adjacent residual mass on CT. Careful qualitative interrogation of PET images is critically important. PET physicians and radiologists must use their clinical expertise to interpret the distribution, shape, and variations of the 18F-FDG activity. The qualitative visual assessments of Peter Mac and Deauville allow this flexible interpretation, which in this study has translated into a superior prediction of OS and critically superior distinction between CMR and non-CMR.
To account for posttreatment inflammation, the Hopkins group in their devised visual criteria incorporated an intermediate response group, defined by diffuse 18F-FDG uptake above blood pool or liver activity (23). This group was classified as “probably inflammatory” and considered negative for malignancy for their final analysis. When this group was separately analyzed for OS association, a distinct intermediate survival curve was delineated. On the basis of semiquantitative response assessment, this group could be categorized as partial metabolic response, stable metabolic disease, or even progressive metabolic disease with misleading prognostication (Fig. 2A). In our study, the largest intercriteria response migration was seen in the group with stable metabolic disease, with the semiquantitative response showing a higher number of patients despite the fact that their survival curve was similar to that of CMR patients. This finding again underscores the importance of a meticulous visual assessment on sequential PET scans. Respiratory gating may help to minimize such discordance (24).
EORTC and PERCIST criteria have shown a κ of 0.95, corresponding to almost perfect agreement with each other. A metaanalysis of 6 studies included 348 patients with solid tumors (13% with lung cancer) showed an almost perfect concordance between EORTC and PERCIST, with a κ of 0.946. Four additional studies confined to NSCLC (16,25–27) corroborate this excellent agreement (28).
The Peter Mac and Deauville responses were identical in all but 1 patient. This finding is unsurprising because these criteria are similar and are intended to be simple and reproducible. For CMR, Peter Mac criteria require the uptake to be equal to or less than uptake in reference tissue in which the baseline lesion is located. Because of low background activity in the normal lung parenchyma, blood pool is the reference tissue in chest malignancies. Deauville, however, uses the organs of reference (blood pool or liver activity) for comparison. In lymphoma clinical trials, a Deauville score of 1–3 (uptake below or equal to the liver; Supplemental Table 1) on posttreatment PET is usually considered CMR (29). Adopting from lymphoma, in this study a Deauville score of 3 (equal to liver) was considered CMR. In practice, the difference between metabolic activity in the blood pool and liver is rather small and may be difficult to differentiate visually, particularly in the presence of treatment-induced inflammatory changes. Using liver as the organ of reference provides a slightly higher visual range for the imaging specialist, which may translate to easier interpretation and potentially less subjectivity. In addition, Deauville criteria have the advantage of widespread familiarity among imaging specialists. Nonetheless, prospective studies are needed to validate Deauville criteria in lung cancer and other solid tumor malignancies.
By pairing the 2 observers in each group based on their years of experience in PET response assessment, we aimed to minimize the inevitable interobserver variability. All 4 criteria showed strong interobserver agreement. The slightly higher κ values of the Peter Mac criteria may reflect the greater familiarity of the observers with these criteria. In a study by Fledelius et al. after 2 cycles of chemotherapy (no radiation) in 35 NSCLC patients, 8 readers evaluated the response based on PERCIST and Peter Mac criteria (30). Both approaches showed strong interobserver agreement but with higher overall agreement in PERCIST criteria. Subjective variability in inclusion of atelectatic changes in the region of interest for semiquantitative analysis was mentioned as one the reason for variability of response assessment by PERCIST. This is likely to be a more significant problem after radiotherapy due to its local inflammatory effects. Therefore, careful visual analysis is of paramount importance, regardless of the response criteria used.
This study has some limitations that should be acknowledged, including the retrospective nature of the analysis and the long time period over which scans were accrued. However, all PET scans were performed as part of prospective trials, harmonized protocols were applied within each, and all assessments were masked to the eventual outcomes.
CONCLUSION
In patients with NSCLC treated with definitive chemoradiation, qualitative and semiquantitative 18F-FDG PET/CT response criteria both provided powerful early, posttreatment predictive information. The Peter Mac and Deauville visual criteria showed stronger associations than the 2 semiquantitative criteria, EORTC and PERCIST. Regardless of the criteria used, careful and intelligent visual assessment of 18F-FDG PET/CT images is of paramount importance because of the commonly occurring postradiotherapy inflammatory changes.
DISCLOSURE
The 18F-FLT/18F-FDG study (ACTRN12611001283965) was supported by the National Health and Medical Research Council (APP1003895) and the Victorian Cancer Agency. The 68Ga-ventilation/perfusion PET study (U1111-1138-4421) was supported by the National Health and Medical Research Council (APP1038399) and the Cancer Australia Priority–driven Collaborative Cancer Research Scheme (project 1060919). David L. Ball has an advisory role for Pfizer Australia. Rodney J. Hicks has ownership interest in Telix Radiopharmaceuticals. Shankar Siva has an advisory role for Astellas and Janssen, receives research funding and is on the speakers’ bureau of Varian Medical Systems and Merck, Sharp, and Dohme, and receives travel and accommodations expenses from Bristol-Meyer-Squibb. No other potential conflict of interest relevant to this article was reported.
Acknowledgments
We are grateful for the contributions of Lisa Selbie and Ann Officer, our radiation oncology data managers and clinical trial colleagues.
Footnotes
Published online Jul. 20, 2018.
- © 2019 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication May 6, 2018.
- Accepted for publication July 9, 2018.