Abstract
This study assesses the predictive value of 18F-FDG PET for overall survival in lung cancer patients treated with a targeted drug. Methods: 18F-FDG PET was performed in 125 second- or third-line non–small cell lung cancer (NSCLC) patients with a baseline Eastern Cooperative Oncology Group performance status less than 3 before treatment with erlotinib (150 mg daily) and 2 wk into treatment. The predictive value of 18F-FDG PET, clinical parameters, and epithelial growth factor receptor (EGFR) mutation status for survival duration was evaluated by fitting accelerated failure time models. Results: New lesions on PET at 2 wk, EGFR mutation status, performance status, and baseline tumor burden were independent and significant predictors of overall survival. Reduction of maximum standardized uptake value by at least 35% was predictive of survival only when EGFR mutation status was not accounted for. Conclusion: 18F-FDG PET in second- or third-line NSCLC patients at 2 wk after starting treatment with erlotinib carries information about overall survival. Parametric survival modeling enables a quantitative assessment of the predictive value of 18F-FDG PET in the context of clinical and laboratory information. New-lesion status by 18F-FDG PET at 2 wk is a potential surrogate biomarker for survival in NSCLC.
PET using 18F-FDG, especially combined with CT (1), is an established tool for the diagnosis and staging of a variety of cancer types (1). There is also increasing evidence that 18F-FDG PET can assess therapeutic response earlier than CT (2), enabling modification of ineffective therapy and potentially improving therapeutic outcomes. This utility hinges on a reliable and validated link between early 18F-FDG PET response and improved clinical outcome. Although such evidence is growing for several tumors, including lung cancer (3), and with treatments including radiotherapy and chemotherapy (4,5), it may not hold uniformly across indications and therapies. Here, using data from 2 global multicenter, phase II studies of second- or third-line non–small cell lung cancer (NSCLC) patients treated with a targeted agent, we investigate the extent to which survival duration can be predicted with 18F-FDG PET information obtained at baseline and early in treatment.
If a predictive link can be demonstrated, 18F-FDG PET may serve as a noninvasive biomarker for survival in NSCLC. In this context, 18F-FDG PET responses may serve as a criterion for moving novel anticancer therapies forward to more costly phases of development or to stop the development of ineffective compounds. This may be particularly important for molecularly targeted therapies, some of which may have cytostatic effects and therefore not lead to conventional radiologic response. Effects of targeted therapies often depend on expression levels and mutations of receptor and signaling proteins (6). Thus, although changes on 18F-FDG PET can occur within hours or days, well before significant cell loss, these may merely indicate effects on membrane transport or metabolism of glucose rather than cell kill (7), and it is not clear whether such responses are also predictive of survival.
While accounting for patient heterogeneity due to clinical and mutation status, we used multivariate survival analysis based on the accelerated failure time (AFT) models (8,9) to relate changes observed on 18F-FDG PET with survival duration. In similar analyses of NSCLC (10,11) and colorectal cancer patients (12), AFT models were recently applied to assess the predictive value of CT tumor size measurements for survival. For both tumor types, these studies found tumor size before treatment, change of tumor size at 7 (12) or 8 wk into treatment (11), and the Eastern Cooperative Oncology Group (ECOG) performance status before treatment to be significant predictors of survival.
Because our analyses were performed on single-arm data, it was not possible to discern between predictive and prognostic effects. Therefore, throughout this article, our use of the term predictive refers primarily to a variable's statistical ability to inform about survival duration, independent of possible treatment effects. Moreover, therapeutic effects due to drug treatment are not discernable from our work.
The overall goal of our analyses was to determine whether 18F-FDG PET changes after 2 wk of treatment with a targeted drug can predict survival in NSCLC patients.
MATERIALS AND METHODS
Patients and Treatment
One hundred thirty-six patients with refractory or recurrent NSCLC after second- or third-line treatment were studied in the following 2 international, multicenter trials: a single-arm study of erlotinib (OSI3926g (13,14)) and in the erlotinib arm of a 2-arm study comparing an antibody to the MET receptor plus erlotinib with erlotinib alone (OAM4558g (15)). All patients received erlotinib at a dose of 150 mg daily (orally) until occurrence of disease progression or severe side effects. Epithelial growth factor receptor (EGFR) mutation status was available for 100 patients. Mileshkin et al. (14) and Spigel et al. (15) provide more complete descriptions of the design of the 2 studies.
For exploratory purposes, OSI3926g and OAM4558g prospectively collected 18F-FDG PET data at baseline and early in treatment. Eleven patients either dropped out for clinical reasons (e.g., due to withdrawing consent, progressive disease, adverse event, or death) or did not have complete imaging data, leaving 125 patients (Table 1) who received treatment for at least 2 wk and who had 18F-FDG–avid PET scans at baseline.
Survival duration was defined from commencement of erlotinib treatment. Median survival in the OSI3926g and OAM4558g studies was 7.2 and 7.8 mo, respectively, whereas the time since initial NSCLC diagnosis was 10.3 and 11.8 mo. At the time of each study's closure, 34 cases were censored. Marginal distributions of several demographic (age, sex, smoking status) and clinical variables (performance status, EGFR mutation status, histology, survival duration) between the 2 studies were all strikingly similar in both studies, justifying their combination. Except for viable tissue sample availability, which was higher in OAM4558g because of a mandatory tissue sampling requirement, there were no statistically significant differences in the baseline variables described in Table 1 at the 0.05 level of significance between the 2 studies.
Imaging Acquisition and Quantitation
All 125 patients underwent PET/CT using low-dose, unenhanced CT. As multicenter trials, a range of scanners from different vendors was used but serial scanning was performed on the same scanner, prospectively qualified by an imaging core laboratory, and was acquired according to an imaging charter designed to comply with guidelines from the National Cancer Institute on the use of PET for response evaluation (16). Baseline scans were to be obtained within 14 d before treatment initiation, and early-response scans were targeted for day 14 after the start of treatment, with a window for days 11–17 (for OSI3926g) and days 10–14 (OAM4558g). The observed mean for the number of days elapsed between baseline and follow-up scans was 22.3 (range, 12–35 d) and 20.1 (range, 13–42 d), respectively, for OSI3926g and OAM4558g. Overall, 7 patients violated the imaging charter requirement with respect to scan day specifications.
Both trials specified a fasting time of at least 4 h. Audited imaging compliance parameters included 18F-FDG uptake time, administered activity, scanning direction and arm position, and pre–18F-FDG blood glucose levels. A high level of compliance with this charter within OSI3926g has previously been documented (13). The compliance with the imaging charter for OAM4558g was comparable to that of OSI3926g (17). Accordingly, it is reasonable to assume that the consistency of imaging methodology was good in individual patients within and between both trials. Two separate core laboratories, masked to clinical details, performed analysis of the PET results. For the OSI3926g trial, target lesions were selected first for significant 18F-FDG uptake (compared with adjacent background) and second for suitability for response evaluation on the unenhanced CT. The target lesion selection procedure in OAM4558g was the same but supported by diagnostic CT if available (otherwise by coacquired CT). Lesions with extensive necrosis at baseline, as indicated by central photopenia, were generally not selected. Newly detected lesions were defined as regions of interest that were not sufficiently 18F-FDG–avid at baseline to be definite sites of disease but subsequently determined to be above the diagnostic threshold at the week-2 scan. Radiologic responses were evaluated by different core laboratories on the basis of diagnostic CT with intravenous contrast at day 56 of treatment.
The average number of target lesions per patient in the 2 trials was 2.8 (OSI3926g) and 2.7 (OAM4558g), with 1–5 lesions in both studies. More than one target lesion was present in 72.3% (OSI3926g) and 73.3% (OAM4558g) of patients. The mean of baseline average maximum standardized uptake value (SUVmax) across both trials was not significantly different (P = 0.195) at 6.2 (SD, 2.5) for OSI3926g and 7.6 (SD, 3.5) for OAM4558g. The ranges were 1.7–68.2 and 1.4–76.5, respectively. Notably, EGFR mutation status did not influence these baseline statistics. As measured by coacquired baseline CT, the mean sum of longest dimensions among target lesions on OSI3926g was 8.0 cm (SD, 4.3), with a range of 1.2–19.2 cm. The corresponding figures for OAM4558g were 7.8 cm (SD, 4.7), with a range of 1.1–18.6 cm.
In the absence of guidance on the optimal methodology to assess therapeutic response, we adopted a pragmatic approach of using the arithmetic mean of the percentage change in individual lesions identified prospectively on the baseline study. This approach is as near as possible to standard radiologic assessment wherein target lesions are selected and followed prospectively to assess response but identification of new lesions is deemed to indicate progressive disease. PET Response Criteria in Solid Tumors (PERCIST) (2) had not been published when this trial commenced.
Multivariate Survival Analysis
The Kaplan–Meier (18) procedure was used to provide nonparametric estimates of median survival and to produce descriptive plots of the survival function. To quantify and test whether changes in the tumor glucose uptake were related to length of survival, we used the AFT model. The AFT model is a commonly used tool in survival analysis (8,9) and relates the log of survival duration to a set of explanatory variables through a linear parametric form.
The AFT model is specified as follows, with T representing the length of survival in months and X1,…, Xp denoting observations on p explanatory variables:
On the basis of the information obtained by 18F-FDG PET, we evaluated the following AFT model:
We initially restricted our analyses to evaluating the predictive value of average percentage change in SUVmax expressed on a linear scale. We later extended the definition of δSUVwk2 to be based on that of the best-performing lesion (in percentage change from baseline), the worst-performing lesion, and the criteria given by PERCIST (2). The model can also handle thresholds; for example, if δSUVwk2 is less than −25%, as specified by the PET partial-response guidelines of the European Organisation for Research and Treatment of Cancer, one sets the explanatory variable associated with δSUVwk2 equal to 1 (and zero otherwise). However, the actual threshold to determine a response may vary across indications and therapies, and we defer to the “Discussion” section consideration of this type of model that includes estimation of the threshold.
The model was fitted in a forward fashion by first evaluating the prognostic variables PS and SUVbase, followed by an evaluation of the predictor variables δSUVwk2 and
RESULTS
The Kaplan–Meier procedure gave an estimated median survival of 7.6 mo, with a 95% confidence interval from 6.9 to 9.2 mo (Fig. 1A). A lognormal AFT model without explanatory covariates was also fitted to the data, and the resulting parametric estimate of the survival function is plotted.
Baseline PS is clearly associated with survival duration (P < 4.5 × 10−4), with an R2 value of 9.2% (Fig. 1B, which depicts Kaplan–Meier and AFT estimates of the survival functions for patients with baseline PS values of 0 [n = 34], 1 [n = 80], and 2 [n = 11]). To model the effect of baseline tumor burden (i.e., SUVbase), we evaluated both the sum of SUVmax across lesions and the simple metric given by the total number of target lesions. Because these variables give statistically indistinguishable (and significant) results, we let SUVbase be defined by the total number of target lesions. Both baseline ECOG performance status (P < 8.8 × 10−4) and the number of target lesions (P < 1 × 10−3) were statistically significant, with a resultant model R2 of 16.4%. We next evaluated the predictive value of observing changes in 18F-FDG PET.
A scatterplot of the week-2 percentage change from baseline in SUVmax versus survival duration (in log scale; Fig. 2A) shows that large relative decreases in SUVmax appeared to be associated with longer survival times. Although the data are noisy, when evaluated as a linear relationship (i.e., as a correlation), the association between SUVwk2 and survival duration is statistically significant (P < 0.028). The week-2 percentage change in SUVmax also remained significant when added to the prognostic model (P < 0.017), but the unique contribution of SUVwk2 to R2 was a low 3.3%. Thus, for these data, only substantial decreases in SUVmax were reliably associated with favorable survival, yet such changes occur mostly in EGFR mutant tumors.
New lesions (by 18F-FDG PET) were detected in 27 (of 125) patients on the day-14 scan. In contrast to changes in SUVmax, new-lesion status by 18F-FDG PET was a highly significant predictor of survival duration (P < 2.2 × 10−5). Most patients with shorter survival have new lesions (Fig. 2A), whereas most patients with longer survival do not exhibit new lesions. The parameter estimate associated with
Figure 2B is based only on the subset of patients (100/125) with known mutation status and demonstrates the effect of EGFR mutant tumors on survival duration. As can be seen, all (known) EGFR-mutant patients have a long survival duration (relative to the median), and most exhibit large reductions in SUVmax. To estimate the independent effect of mutation status on survival, we add the variable EGFR to the model that already includes PS, SUVbase, and
It is plausible that the average percentage change in SUVmax across lesions does not adequately capture clinical benefit and that perhaps a metric that focuses on the best-performing (or worst–performing) lesion would do better in terms of predicting overall survival (OS). To test this possibility, we defined SUVwk2 by the minimum percentage change across lesions (i.e., the lesion with the largest decrease). However, this definition failed to predict OS even when entered by itself into the model (P = 0.35) or when added to the predictive model that already included PS, SUVbase, and
Returning to the final predictive model of Table 2, we note that NL remained highly statistically significant in the model that included PS, SUVbase, and EGFR mutation status. Thus, new-lesion status is a disease characteristic that is strongly related to survival but is not reflected by the baseline characteristics as measured by PS and SUVbase and is also not uniquely attributable to EGFR mutation status. In a limited patient sample such as this, this result would be difficult to demonstrate without the use of a multivariate modeling approach to control for confounding factors.
To illustrate the estimated model that includes performance, new-lesion, and EGFR mutation status, Figure 3 shows the estimated survival functions for EGFR wild-type patients with PS values of 0 (Fig. 3A) and 1 (Fig. 3B), with and without new lesions. Controlling for EGFR mutation status, the figures show that the appearance of a new lesion shifts the survival curve to the left and that this result holds independently of PS. Overlaid Kaplan–Meier curves validate the AFT model fits (Fig. 1). As seen in both plots, the median survival was approximately halved for patients with new lesions at day 14. Accordingly, when expressed using the proportional hazards model, lack of a newly detected lesion at day 14 was associated with a hazard ratio of approximately 0.45, or alternatively, greater than 2 when defined by appearance of new lesions. There are not enough data to perform this particular validation for a PS of 2 or across values of SUVbase.
DISCUSSION
Our results indicate that the detection of new lesions by PET at early response assessment is a strong, independent predictor of OS in refractory NSCLC patients treated with an EGFR inhibitor. The results also indicate that observed reductions in SUVmax are informative about survival only when such changes are large, yet they fail to reach significance when EGFR mutation status is considered.
Although the development of new lesions within 2 wk of starting therapy may seem to be an unlikely event, recent work demonstrates that rapid progression can occur in NSCLC (19). Hence, it is plausible that new lesions on 18F-FDG PET reflect aggressive biology, with new metastasis or growth of small lesions overcoming partial-volume effects within the steep component of the count-recovery curve leading to a significant increase in standardized uptake value (SUV) (20). By either mechanism, early disease progression would be expected to be associated with poor survival, consistent with our results.
New lesions were strikingly absent in patients with an SUVwk2 of less than −35% (Fig. 2), further suggesting that these lesions represent disease progression. This observation suggests that a comparison of new-lesion counts (by PET) between treatment groups may be a way of detecting drug effects in clinical trials. The value of this observation should be clarified by systematic inclusion of 18F-FDG PET into placebo-controlled trials of molecularly targeted therapies. Moreover, our findings may not hold for patients treated with chemotherapy or in other cancers with more indolent behavior.
There is no clear-cut site predominance among the new lesions identified by 18F-FDG PET. Moreover, there are no significant differences in age, sex, baseline ECOG performance status, smoking status, KRAS mutation, or histologic subtype for patients with and without newly detected lesions. The newly detected lesions had an overall mean SUVmax of 6.1 (range, 1.3–16) and were clearly detected above adjacent background tissue.
When modeled as a linear predictor (see the “Results” section), we found that changes in SUVmax were a poor predictor of survival. However, inspection of Figure 2 shows that a drop of SUVmax by more than 40% fairly consistently predicts survival beyond the overall median value. In contrast, there is no correlation between δSUVwk2 and survival when δSUVwk2 drops by less than 40% (or when it is positive). On the basis of these data patterns, the data were explored by fitting a threshold model in which an optimal value of −35% was estimated for δSUVwk2, with a 95% confidence interval (based on the profile loglikelihood) from −50% to −30%. These results hold regardless of whether one accounts for the baseline prognostic variables and new-lesion status. Thus, in this setting, the relationship between changes in SUVmax and survival appears to be nonlinear, with a potentially larger response required for clinical significance than that needed after chemotherapy (e.g., −25%, as suggested by guidelines of the European Organisation for Research and Treatment of Cancer) (2,21).
For these reasons, a partial metabolic response by 18F-FDG PET, defined by a change in average SUVmax below −25% and the absence of new lesions, is mainly informative about survival because of the strong effect due to NL. In fact, on the basis of a predefined cutoff of −25%, changes in SUVmax are not predictive of OS (P = 0.093). Our results differ from the primary report of one of the studies (14), likely because we have taken covariates into account.
The finding that EGFR mutation predicts improved survival (Table 2) is consistent with the recently completed phase III European Tarceva versus Chemotherapy study, which shows improved clinical response in this subpopulation (22). In our study, for the subset of patients in whom EGFR mutation status was available, this variable was able to replace a δSUVwk2 of −35% or less as a predictor, with a superior result. In accordance, Figure 2 suggests that a δSUVwk2 of −35% or less is strongly linked to EGFR mutation status.
One plausible reason why the average change in SUVmax is not a particularly sensitive predictor for OS may be the large degree of heterogeneity between lesions within patients. For example, among the 91 patients with at least 2 target lesions, the average range between the best- and worst-responding lesions (in terms of percentage change from baseline) is 38 percentage points. Moreover, 25% of these 91 patients have a range greater than 50 percentage points, and 5 patients have a range greater than 100 percentage points. However, as noted in the “Results” section, neither the single best-performing (or worst–performing) lesion nor the PERCIST criteria improve the predictive power of these models. To obtain a clinically useful metric, we speculate that one may need to require some degree of agreement across the within-patient lesion changes in addition to reduction in 18F-FDG uptake.
Performance status before treatment (PS) was found to be an important predictor of OS (Table 2), confirming results reported for another NSCLC study involving treatment with targeted drugs and conventional chemotherapy (11). That study also identified the baseline sum of the longest dimensions as another significant predictor of survival. Here, similar to previously published studies (23,24), our study demonstrated the prognostic value of baseline 18F-FDG PET. However, these studies did not account for performance status, and it has been noted that the prognostic value of baseline 18F-FDG PET may also depend on histologic subtype (24).
The described analyses can also be performed using the semiparametric proportional hazards model (25,26). However, for ease of interpretation, and to enable a straightforward comparison with previous parameter estimates in NSCLC populations (11), we chose the AFT modeling framework. In addition, the parametric survival model enables simulation of hypothetical clinical trial outcomes (10). An excellent tutorial review of survival analysis in the context of cancer research, including a comparison of the AFT- and Cox-regression frameworks, is given in the study by Bradburn et al. (27).
CONCLUSION
According to this study of 125 NSCLC patients with residual disease after second- or third-line therapy with erlotinib, 18F-FDG PET early after beginning treatment with a targeted drug can carry information about OS. Newly detected lesions appear to be more informative for OS than changes in SUVmax. Placebo-controlled clinical trials that include 18F-FDG PET will have to clarify to what extent early 18F-FDG PET responses to targeted drugs help identify drugs that prolong survival.
DISCLOSURE STATEMENT
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
We gratefully thank Bernard Fine, Andrea Pirzkall, John Bothos, Premal Patel, Jill Fredrickson, and Alex de Crespigny for their stewardship of OSI3926g and OAM4558g. We also thank David Binns and Jason Callahan for their technical expertise. We extend our sincere gratitude to the patients of both studies. No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Mar. 13, 2012.
- © 2012 by the Society of Nuclear Medicine, Inc.
REFERENCES
- Received for publication May 9, 2011.
- Accepted for publication November 29, 2011.