Abstract
Despite recognized limitations, structural imaging with CT remains the standard technique for evaluating the response of lung cancer to both chemotherapy and radiotherapy. This evaluation has become increasingly important with the advent of neoadjuvant therapy before surgery. The high uptake of 18F-FDG in most lung cancers and the demonstration that successful treatment reduces uptake have led to increasing enthusiasm for the use of PET and PET/CT to assess the therapeutic response. In this review, theoretic considerations and current evidence supporting the role of 18F-FDG PET are discussed.
Until recently, early detection and surgery were virtually the only hope of cure for patients with lung cancer. In addition, because most patients were not diagnosed as having malignancy until they presented with symptoms related to advanced disease, long-term survival for patients with this type of cancer was rare. On the basis of even the most recent data from the Surveillance Epidemiology and End Results program, at least half of all patients with lung cancers present with already disseminated disease. In the United States, mortality rates from lung cancer have fallen significantly in men over the last 2 decades, but it is only in the last few years that they have fallen in women. For both men and women and for all age groups, there has remained a close association between incidence and mortality, suggesting that there has been relatively little improvement in the survival of patients diagnosed with lung cancer over many decades (1). It would appear that a decline in smoking rates has been the major causal factor in the observed reduction in mortality from lung cancer. Lung cancer remains one of the major causes of cancer-related death throughout the developed world and is becoming an increasing health problem in developing countries, where smoking rates remain high. It is estimated, for example, that 1 in 3 of the world's smokers live in China. The most common form of lung cancer is non–small cell lung cancer (NSCLC).
In an effort to improve the survival of patients with NSCLC, there has been a focus on earlier diagnosis and, once cancer has been confirmed, on improved treatment selection and planning. The advent of PET and, in particular, the widespread implementation of PET/CT with 18F-FDG have allowed more accurate detection of both nodal and distant forms of metastatic disease (2). This advance has been complemented by improved methods for sampling mediastinal nodes, including endoscopic ultrasound-guided biopsy, with many such studies providing complementary information (3). The consequences of more sensitive detection of metastatic disease are that fewer patients are likely to undergo futile thoracotomy (4) and that more patients will be identified as requiring aggressive locoregional or systemic treatment. The detection of previously occult systemic metastases is particularly likely in patients who would have formerly been given radiotherapy with a curative intent (5). As a corollary of more sensitive detection of metastatic disease, more patients are deemed not to be candidates for curative treatment with a single modality but nevertheless to have a relatively good performance status and therefore to be candidates for more aggressive palliative regimens. Although at least 169 different prognostic factors for predicting the survival of patients with NSCLC have been identified, tumor stage remains the most important of these (6).
Complementing the improved definition of the extent of disease, an ever-increasing array of therapeutic options for NSCLC is being applied in the hope of changing the dismal survival that has traditionally accompanied this diagnosis. These options include more aggressive surgical techniques and the use of neoadjuvant radiotherapy and chemotherapy regimens before surgery (7). There are also adjuvant therapies for patients in whom there is known or suspected incomplete resection. Even for patients with distant metastatic disease, new chemotherapeutic agents and molecularly targeted therapies have been shown to improve survival or quality of life (8,9). However, these innovations have led to spiraling costs and, in some cases, increased morbidity while yielding only modest improvements in survival for patients with NSCLC, particularly early-stage disease (10,11). Therefore, there is a pressing need to validate the effectiveness of treatment in individual patients as well as in specific groups of NSCLC patients for the purpose of developing appropriate treatment guidelines. Early and robust identification of a response is needed to allow the termination of ineffective agents and a change to alternative management that may be more efficacious. In clinical trials, an assessment of the therapeutic response is used to assess biologic efficacy and as a potential surrogate of survival. These data are vital to decisions about the allocation of scarce health care resources.
At present, an assessment of the therapeutic response in NSCLC is primarily based on changes in the measured dimensions of lesions identified on CT. These changes are graded on the basis of definitions detailed in Response Evaluation Criteria in Solid Tumors (RECIST) (12). These criteria represent a modification of much earlier World Health Organization (WHO) response criteria (13). Both include definitions for a complete response, a partial response, stable disease, and progressive disease, which are based on the percentage change in lesion dimensions. Similarly, both systems can be traced to an even earlier study that assessed the reliability with which experienced clinicians were able to estimate the sizes of simulated lesions (wooden spheres of various sizes under a foam mattress) by using palpation and physical measurements (14). That study had nothing to do with imaging but, because it was based on the physical characteristics of lesions and the reproducibility of the measurements of lesion dimensions, there was at least some tenuous logic for extrapolating data from that study for the purpose of defining criteria with which CT dimensions might be used to assess a therapeutic response.
The assumption that changes in tumor dimensions, as measured with CT or any other anatomic imaging modality, are a true marker of therapeutic efficacy is fraught with difficulty because tumors comprise variable proportions of malignant cells, stroma, and inflammatory cells. The regression of all of these components may occur slowly and incompletely. Even after cure of NSCLC, fibrotic masses may remain. Response assessment with CT is especially unreliable in lung cancer because of the inaccuracy of initial lymph node staging. This problem leads to lesions that harbor disease being ignored and others that do not being assessed for a response. For the primary tumor, the confounding effects of atelectasis (Fig. 1) and of radiation pneumonitis and subsequent fibrosis (15) compromise the definition of the dimensions of the primary lesion before and after treatment, respectively. Some of these limitations are compensated for by the use of serial CT after treatment. Further regression at a later time point is generally accepted as evidence of a therapeutic benefit, whereas an increase in size is deemed to represent progression. However, some cancers have very low doubling rates and therefore may not demonstrate appreciable enlargement for months and even years, whereas more aggressive cancers may grow in weeks. These circumstances create uncertainty regarding the optimal timing of follow-up imaging and potentially lead to residual disease being unrecognized until it is too late for salvage therapy. Conversely, a slow regression in lesion size after treatment may lead to the unnecessary prolongation of treatment or even to the institution of more aggressive therapies in the mistaken belief that there has been a poor response to earlier management.
18F-FDG PET FOR THERAPEUTIC MONITORING IN NSCLC
Molecular imaging offers the potential to characterize the nature of tissues on the basis of their biochemical and biologic features. Thus, the information provided is fundamentally different from that provided by anatomic imaging. The imaging of changes in glucose metabolism, as reflected by cellular uptake and trapping of the glucose analog 18F-FDG, can provide a response assessment that is both more timely and more accurate than that provided by standard structural imaging.
One of the major theoretic advantages of 18F-FDG PET compared with structural imaging techniques is that there is usually a more rapid change in cellular metabolism than in tumor size. Although there is growing clinical recognition of the role of PET in early therapeutic response assessment, the preferred methodology remains controversial. The European Organization for Research and Treatment of Cancer (EORTC) developed guidelines for the methodology of performing serial 18F-FDG PET evaluations and reporting metabolic responses (16). These guidelines were recently updated in a consensus statement from the National Institutes of Health in the United States (17). Unfortunately, both consensus statements have failed to resolve ongoing debate regarding the parameters that should be measured in determining a therapeutic response, when they should be measured, or how a response should be defined.
The uptake of 18F-FDG can be assessed by qualitative, semiquantitative, and quantitative means. Each has advantages and limitations. With whole-body PET/CT becoming the standard of care for cancer staging because of its high diagnostic accuracy (18) and ability to provide a rapid survey for both regional and distant forms of metastatic disease, there are practical benefits of using this type of study as the baseline evaluation for a therapeutic response assessment. These benefits are particularly pertinent to the evaluation of regional lymph nodes that are the target of neoadjuvant therapy before surgery because PET/CT appears to have an accuracy (19) significantly higher than that previously reported for either CT or stand-alone 18F-FDG PET (20). Both 18F-FDG PET and 18F-FDG PET/CT are suitable for either qualitative or semiquantitative analysis. However, qualitative assessment is perceived to be subjective, whereas semiquantitative measures can be influenced by technical and physiologic factors. On the other hand, the dynamic imaging protocols that are needed for quantitative analysis require prospective determination of the region to be imaged and are able to assess only lesions that can be included in a single axial field of view. Therefore, these protocols usually must be followed by a delayed whole-body scan if a survey for metastatic disease is planned. Such acquisition protocols are not well suited to routine clinical use, despite theoretic advantages with respect to biologic characterization.
The majority of reports regarding the use of 18F-FDG PET for therapeutic response assessment involved either qualitative or semiquantitative evaluation. The term “metabolic response” is now being widely used to denote the degree of change in 18F-FDG uptake in target lesions. For reporting the results of serial 18F-FDG PET scans, it is important to use standardized nomenclature that can be applied to all tumor types and that can be consistently used by different individuals and institutions. Most importantly, it must be readily understood by clinicians. In the visual grading schema described by MacManus et al. (21), a complete metabolic response (CMR) is defined as a return of 18F-FDG uptake in previously documented lesions to a level equivalent to or lower than the radioactivity in normal tissues in the involved organ. In essence, a CMR reflects a loss of features suggesting residual malignancy. It does not necessarily represent normalization of the scan results because posttreatment changes may remain, particularly after radiotherapy (22). A partial metabolic response (PMR) constitutes a significant visual reduction in 18F-FDG uptake in tumor sites on the basis of a visual inspection of appropriately displayed comparative images but residual abnormality suggesting malignancy. Stable metabolic disease (SMD) is defined by a lack of change, whereas progressive metabolic disease (PMD) is characterized by an increase in the extent of metabolic abnormality in a pattern consistent with tumor growth or by the development of new sites of disease. To decrease the potential subjectivity of a visual analysis, there must be a consistent display of images with normalization to appropriate reference tissues. Because of the relatively stable hepatic glucose uptake and therefore 18F-FDG uptake, under fasting conditions, the liver is useful as a reference tissue. For categories that involve a qualitative change in the intensity of uptake relative to that in a reference tissue, that is, PMR and PMD, the measurement of tracer uptake could be a useful means for validating the qualitative impression.
As discussed in more detail elsewhere in this supplement to The Journal of Nuclear Medicine, the semiquantitative parameter of 18F-FDG uptake that is currently the most widely used parameter for the assessment of a therapeutic response in tumors is the standardized uptake value (SUV). The SUV is derived by dividing the measured radioactivity in tissue by the total activity administered to the patient and the patient's weight after the application of various correction factors. The maximum SUV (SUVmax) in a lesion is the most reproducible parameter and has therefore become the preferred parameter for assessing the therapeutic response. One of the concerns regarding the use of the SUVmax as a parameter for response assessment is that it ignores changes in the distribution of a tracer within a lesion and in the extent of metabolic abnormality. Accordingly, a rapidly growing tumor undergoing central necrosis may show no change in the SUVmax or may even show a reduction as the thin rim of viable tumor becomes subject to partial-volume effects. Therefore, alternative metabolic parameters that integrate both tumor volume and the intensity of uptake have been suggested. Global or total glycolytic volume is one such parameter (23). When the volume component of this parameter is based on anatomic imaging, the same limitations that accompany RECIST measures are likely to apply. Accordingly, semiautomated methods are being used to provide a more detailed evaluation of metabolic volumes, a process that may be particularly helpful for lesions that are not well suited to RECIST (24). Metabolic volumes may be helpful in the setting of necrotic tumors, tumors associated with distal atelectasis, or posttreatment changes on follow-up CT.
It is necessary to adhere to strict acquisition and analysis protocols, including the use of the same preparation regimen, uptake time, and scanning parameters (ideally, the use of the same scanner maintained with regular quality assurance by use of calibration phantoms) (25), to increase the reproducibility of the SUV over time in a given patient. On the basis of studies in which untreated tumors were compared on 2 occasions, relative changes of approximately 20% are very unlikely to be attributable to measurement error or spontaneous variability in tumor metabolism (26,27). With attention to detail, even less variation between studies might be expected (28). However, in the lungs, special circumstances can also significantly influence the measured SUV, particularly in PET/CT acquisitions. Although there is good evidence that there are generally no major differences between the SUV measures obtained with PET/CT scanners and those obtained with stand-alone PET scanners (29), the use of a CT attenuation map may lead to significant variability in the measured SUV in lung lesions because of respiratory motion (Fig. 2) and the assignment of inappropriate attenuation characteristics (30), particularly when the CT component of the device has fewer than 6 rows (31). Although there have been recent efforts to harmonize the acquisition, processing, and analysis of 18F-FDG PET scans within clinical trials (32), continuing evolution in PET/CT technology, in particular, improvements in resolution and reconstruction algorithms, poses significant challenges for comparing measures from one scanner to the next and from one center to another (33).
There is not yet a consensus regarding what degree of reduction in the SUV should be used to define a metabolic response and how this reduction should be calculated when there are multiple lesions. Neither is it clear at what point in treatment or how long after a particular therapeutic intervention this measure should be obtained. If the goal is for the measurement of the response on PET to serve as a surrogate of the direct biologic effects of a treatment on at least a subpopulation of the cells within a lesion, then it is probably reasonable to assess the response relatively early in treatment and relatively soon after the particular treatment has been delivered. If, however, it is anticipated that the response on PET will serve as a surrogate for survival, then it is likely, from an oncologic perspective, that PET should be performed after the acute metabolic effects of therapy have worn off and the residual signal primarily reflects the number of viable cells. In this setting, it is likely that the least responsive lesion will determine the patient's eventual outcome with respect to cure versus residual disease or relapse. However, the response across all lesions may determine the time to progression in patients who are not cured. It is clear from even a rudimentary understanding of tumor biology and of the diagnostic performance of restaging techniques that the time to progression in patients who are not cured is likely to be a function of both the aggressiveness of the tumor (the rate of cellular repopulation) and the sensitivity of the tests used to detect disease progression. It is logical to assume that the higher sensitivity of 18F-FDG PET than of CT in primary disease staging will be replicated in the restaging setting and therefore that the apparent progression-free survival in patients evaluated by PET will be reduced compared with that in patients in whom disease progression is based on RECIST. Studies of the role of 18F-FDG PET in the evaluation of suspected residual or recurrent NSCLC have certainly demonstrated accuracy and prognostic stratification significantly higher than those achieved with conventional imaging (34–36).
Most studies examining the role of PET in therapeutic response assessment in NSCLC have been aimed at the response in either the primary tumor alone or the most metabolically active mediastinal node. However, for systemic disease, some method for providing an aggregate response is required. A technique for the aggregation process has not been defined, but the simplest potential approach includes averaging the absolute percentage changes in individual target lesions. Although it is commonly assumed that the SUVs of multiple lesions may follow a normal distribution, there is evidence that a lognormal distribution more accurately describes the distribution of SUV across lesions (37). Indeed, the initial SUVmax measurements for primary NSCLC lesions in patients undergoing a baseline evaluation before neoadjuvant therapy demonstrated a lognormal distribution with a median of 10.0 (50% confidence interval, 6.8–13.6) (38). An alternative approach is to use the difference in the sums of the log SUVmax measurements for target lesions at baseline and at follow-up as a percentage of the average individual baseline log SUVs. This approach minimizes the impact of very high SUVs in individual lesions that have the capacity to fall much further than those in lesions with less intense uptake.
In either case, a significant issue is the degree to which the activity in a lesion that is only slightly higher than the adjacent background activity can fall before it becomes indistinguishable from the background activity. Because of a substantially lower background SUV in the lung than in the mediastinum or liver, the activity in a lesion with a given intensity in the lung can fall further than that in a lesion with the same intensity in the mediastinum or liver before it is no longer discernible. One of the important limitations of setting a specific percentage reduction as a representation of a given category of response is that the complete eradication of viable cancer cells will seldom lead to an SUVmax of zero. More often, the activity will return to the background levels in the tissues in which the lesions arise. Accordingly, the activity in a lesion that starts with a low SUVmax but achieves a complete metabolic and pathologic response may be limited to a maximum possible reduction of 30%−50% before it reaches background activity levels, whereas the activity in a lesion with a very high SUVmax may fall by a comparable percentage and still be markedly higher than background activity levels, be classified as a PMR, and yet contain extensive tumor on pathologic evaluation. The problem with trying to assess responses in such cases could be overcome by limiting assessment only to lesions with a high baseline SUV, thereby allowing a significant dynamic range through which the SUV could fall relative to the values for reference background tissues. Unfortunately, there are some histologic types of NSCLC that have an intrinsically low SUVmax. A recent study of 53 patients with 57 pathologically proven lesions showed that pure bronchioloalveolar carcinoma (BAC) in 26 lesions had a median SUVmax of only 1.48 (range, 0.63–4.54), with 81% of BAC lesions having an SUVmax of less than 2.5, a cutoff level often used to differentiate benign from malignant lesions (39). This value was significantly lower (P < 0.0001) than that for lesions containing both adenocarcinoma and BAC, for which the median SUVmax was 6.03 (range, 2.45–24).
The percentage reduction that should constitute a therapeutic response is also controversial. Although the EORTC tried to provide guidance, it basically adopted the same concept that had been used by earlier WHO and RECIST committees and simply considered the reproducibility of the measure. The data on the test–retest reliability of the SUVs of individual lesions and the intra- and interobserver variability of SUV determinations suggest that changes of greater than 20% are generally larger than would be predicted by methodologic factors. Accordingly, a decrease or an increase of a larger percentage likely reflects a biologic effect. However, such changes are not necessarily biologically or prognostically significant. Herein lies one of the problems associated with a standardized definition of a response. Both managing clinicians and regulatory bodies want tests that predict outcomes. With the goal of improving the performance of 18F-FDG PET in fulfilling this function, the degree of reduction in the SUV that defines a favorable therapeutic response has generally been a matter of post hoc determination (using receiver operating characteristic curves) of the threshold of change that provides the best stratification of patient populations on the basis of some other measure of outcome. Consequently, the definitions of a response obtained with this methodology have varied depending on the modality of therapy, the timing of the therapeutic response scan relative to the treatment, the duration of the treatment before the assessment, and the method used as the validating standard. Irrespective of the methodology used, a reduction in 18F-FDG uptake in lesions is usually seen in responding lesions, and the larger the fall and the earlier that it occurs, the greater the likelihood of a patient deriving a prognostic benefit from the therapy. However, the variability in the definition of response criteria that provide the best predictive value for a subsequent therapeutic benefit is of concern to those seeking to standardize response criteria for PET. Nevertheless, as with the validation of the utility of 18F-FDG PET in staging, the strongest evidence for the utility of 18F-FDG PET in therapeutic response assessment will be derived from data demonstrating a stratification of survival more powerful than that achieved with conventional RECIST. There is now increasing evidence that both qualitative and semiquantitative measures of 18F-FDG PET provide this advantage.
SUPPORTING EVIDENCE FOR THERAPEUTIC RESPONSE ASSESSMENT WITH 18F-FDG PET IN NSCLC
Theoretic Considerations
Treatment frequently alters tumor bioenergetics and, thereby, 18F-FDG uptake before and sometimes independent of cell killing. The 18F-FDG signal obtained from a lesion is dependent on both the number of viable cells and the uptake of the tracer in each cell. A recent 18F-FDG PET study evaluating serial changes in the SUV during chemotherapy for NSCLC in 16 patients demonstrated that a 50% or greater reduction in the SUVmax between studies performed after 1 and 3 wk of therapy was predictive of the survival of patients for more than 6 mo, whereas patients with a less marked SUV reduction died within 6 mo (40). However, a negative slope of the line of regression between scans performed at various time points during treatment appeared to confer a clinical benefit. From this information it should be clear that the percentage reduction that predicts a response may relate to the timing of the follow-up scan. A similar study of 15 patients receiving radiotherapy demonstrated that the peak residual 18F-FDG activity and qualitative response obtained during treatment correlated with the overall response obtained 3 mo after treatment (41), which was previously shown to predict outcome (21). The method used to validate the utility of 18F-FDG PET is also likely to influence the apparent performance. Because of the capability of histopathology to detect disease deposits below the resolution of PET and to differentiate inflammatory changes from residual disease (Fig. 3), the apparent performance of 18F-FDG PET in assessing a therapeutic response in studies with histopathology as the reference standard is likely to be inferior to that in studies with serial imaging or survival, in which outcome is not necessarily related to cure. In such studies, the utility of 18F-FDG PET in therapeutic response assessment is demonstrated by its ability to provide prognostic stratification.
Use of Histopathology to Validate Performance of 18F-FDG PET in Response Assessment
The objective of cancer treatment is to eradicate all malignant cells and, thereby, to cure the patient. However, current treatments are relatively ineffective in achieving this objective and, even when a complete pathologic response in all resected material is achieved, a significant number of patients still succumb to cancer, indicating sampling error. An early study evaluating the prognostic value of a complete pathologic response in NSCLC patients found that the 5-y survival rate in 21 patients undergoing complete resection after neoadjuvant therapy was only 54% (42). As judged by histopathology, relatively few patients show a complete pathologic response to aggressive neoadjuvant regimens. In a trial evaluating the predictive value of 18F-FDG PET in assessing the therapeutic response 2 wk after the completion of preoperative chemoradiotherapy in 26 patients, the complete pathologic response rate was only 31% (43). As would be expected with such a low rate of tumor clearance and the limited spatial resolution of 18F-FDG PET, the negative predictive value of 18F-FDG PET was only 55% (on the basis of visual criteria), whereas the positive predictive value was 80%. Review of the raw data for patients with and patients without a complete pathologic response in that study (43) showed that all but one patient had a reduction in the SUV between pretreatment and posttreatment scans. However, the percentage reduction in complete pathologic responders tended to be greater than that in incomplete responders. Furthermore, the posttreatment SUV tended to be higher than 5 in the majority of patients with residual disease, with the exception of those who had a low pretreatment SUV, whereas in the group without residual disease, 4 of 5 patients had a posttreatment SUV of less than 5. Using a cutoff of 3.0, the authors reported that the sensitivity for residual disease was 88% (43).
In another study, quantitative dynamic 18F-FDG PET performed 2 wk after chemoradiotherapy in a cohort of 29 patients with 30 lesions demonstrated a correlation between the residual rate of glucose metabolism, as estimated from 18F-FDG kinetics, and the pathologic tumor response (44). In that series, a complete pathologic response was achieved in 47% of lesions, and there was a significant reduction in glucose use (P = 0.011). A larger retrospective study involving 56 patients, 33 of whom received neoadjuvant chemotherapy and 23 of whom received chemoradiation, revealed a nearly linear correlation between the change in the SUVmax and the percentage of nonviable tumor in the resected material (r2 = 0.75, P < 0.001) (45). In that study, a reduction in the SUVmax of greater than 80% was able to predict a complete pathologic response with an accuracy of 96%. Again, the prevalence of a complete pathologic response was relatively low, at only 34%. Importantly, the area under the receiver operating characteristic curve for the percentage reduction in the SUVmax on 18F-FDG PET was significantly larger than that for the percentage change in CT volume in predicting a complete pathologic response (0.935 vs. 0.53; P < 0.001).
A more recent study in Japan that involved less stringent criteria for a pathologic response and dual-phase PET (early scanning at 1 h and delayed scanning at 2 h) both before and after the completion of chemoradiotherapy found that SUVs on both early and delayed scans after treatment were significantly lower in pathologic responders than in nonresponders (P = 0.0005 and P = 0.0015, respectively) (46). Similarly, a study evaluating the utility of 18F-FDG PET/CT in assessing the response to neoadjuvant chemotherapy or chemoradiotherapy found a significantly greater percentage decrease in the SUVmax in patients showing an excellent pathologic response in the primary tumor than in those with greater than 10% residual viable cells (P < 0.005) (38). In primary tumors smaller than 7.5 cm3, the posttreatment SUVmax was significantly lower in responders than in nonresponders, whereas this relationship was lost for larger tumors. With respect to the pathologic response in mediastinal nodes, a residual SUVmax of 4.1 provided the best discrimination between responders and nonresponders (P = 0.0005). The complete pathologic response rate in both the primary tumor and resected nodes was only 27%.
A study evaluating the pathologic response to induction therapy in 37 of 47 patients who received induction chemotherapy (supplemented in some cases by additional radiotherapy) and then underwent resection found that the SUV was higher in viable residual tumors than in tumors without histologic proof of viable cells (P = 0.006) (47). In the subgroup of patients with residual viable tumors, the median SUVs for patients showing a partial response on CT (on the basis of WHO criteria) and patients with stable disease were 4.7 and 11.8, respectively. However, there appeared to be no substantial difference in SUVs between the various categories of response on CT in patients with no viable cells; the median SUV in this group of patients was 2.2. In another study involving patients treated with neoadjuvant chemotherapy, Dooms et al. found that patients with persistent major mediastinal nodal involvement on 18F-FDG PET had a 5-y overall survival rate of 0% (48). The major pathologic response rate in that series was relatively high, at 70%. Interestingly, as discussed later, survival was still predicted by a change in the SUV in patients who showed a good pathologic response.
Against this generally encouraging series of studies that appeared to indicate that 18F-FDG PET can reasonably reliably predict a favorable pathologic response, a retrospective study at the Memorial Sloan-Kettering Cancer Center involving 56 patients receiving a range of neoadjuvant therapies, including chemotherapy, chemoradiation, and radiotherapy alone, suggested that PET overstaged nodal status in 33% of the patients (49). However, in that study, only 14 of the 56 patients underwent a baseline evaluation, limiting the ability to assess both the percentage reduction in the SUV and the qualitative response. As demonstrated in the study of Dooms et al. (48), 18F-FDG PET may provide prognostic information above and beyond that provided by pathology. Another study (50), confined to patients scanned 2 wk after the last cycle of chemotherapy, found that a 50% reduction in the SUVmax poorly predicted a major pathologic response, which was seen in 25% of specimens. Although all 5 patients showing a response reached this threshold, 8 other patients not showing a major pathologic response also reached this threshold.
Use of Prognostic Stratification to Validate Performance of 18F-FDG PET in Response Assessment
In patients with locally advanced NSCLC, especially those with mediastinal lymph node involvement (stage IIIA), surgery is usually contraindicated. Neoadjuvant chemotherapy or chemoradiotherapy has been proposed as a treatment in such patients in the hope that, if successful in eradicating viable tumor cells in involved nodes, such treatment may render the patients curable by surgery. However, given that relatively few patients with locally advanced NSCLC are currently cured, the ability of diagnostic tests to predict the duration of survival is an important measure of therapeutic efficacy and may help to better select patients for salvage or palliative therapies. The ability of 18F-FDG PET to provide prognostic information was demonstrated in a pilot study involving 15 patients receiving induction chemotherapy (n = 9) or radiotherapy (n = 6) (51). In the 7 patients with persistently elevated 18F-FDG uptake in the mediastinum, all developed early systemic disease and died, whereas 7 of the 8 patients with negative mediastinal PET results remained free of extracerebral relapse. Even with the small number of patients studied, it was observed that patients with PET downstaging had significantly longer cumulative survival than patients with a persistent mediastinal nodal abnormality (P = 0.014), whereas a partial response on CT was not predictive of outcome. There was also a significant difference in survival between patients with and patients without a 50% reduction in the SUVmax of the primary tumor (P = 0.03).
In a larger prospective study (21), a metabolic response to chemoradiation, as assessed by visual analysis of 18F-FDG PET results, was also much more powerfully correlated with survival than the response on CT determined from WHO criteria. In 73 patients evaluated with both PET and diagnostic CT scans before and at a median interval of 70 d after treatment, PET metabolic and CT morphologic response categories were identical in only 40% of patients, with significantly more patients showing a CMR (n = 34) than showing a complete response on CT (n = 10). In a multivariate analysis including the known prognostic factors of a response on CT, performance status, weight loss, and stage, only the metabolic response was significantly associated with survival (P < 0.0001). In that series (21), patients showing a CMR survived longer than those showing a PMR, and the latter, in turn, showed survival superior to that of nonresponders (SMD or PMD); these results suggested that a reduction in the SUVmax might have further stratified patients who did not show a CMR.
A study of an expanded cohort of 88 patients showed that relapse with distant metastases was significantly less common in patients who attained a CMR after 70 d than in those who attained a PMR (52). It is unclear whether this finding reflects the efficacy of the primary treatment for controlling microscopic disease beyond the initial treatment volume or simply the benefit of better local control in preventing subsequent metastatic spread. A study in Germany (53) involving 70 patients undergoing neoadjuvant chemoradiotherapy found that patients with either a CMR, as determined by qualitative criteria, or an 80% reduction in the SUVmax had significantly longer survival than patients with a PMR (P = 0.0001). Progressive disease on 18F-FDG PET was associated with an unfavorable outcome (P = 0.005). In that study (53), the ability of a patient to undergo curative resection was also prognostically significant (P < 0.001). Importantly, in that study, 18% of potentially eligible patients were excluded because of the detection of systemic metastases. In a study evaluating both the histopathologic response to neoadjuvant chemoradiation and survival, both the pathologic response and the response on 18F-FDG PET appeared to provide prognostic information (38). The use of 45% to 55% reductions in the SUVmax of the primary tumor as a cutoff indicated that patients with a more marked metabolic response had a survival rate of 83% at 16 mo; the rate for patients with smaller reductions was 43% (P = 0.03).
18F-FDG uptake in inflamed normal tissues must be considered when 18F-FDG PET is used for response assessment after radiotherapy. Serial imaging during and after radiotherapy suggests that inflammatory 18F-FDG uptake in normal tissues increases in the first few months after treatment rather than occurring early during radiotherapy (41). However, these delayed changes need not prevent an experienced observer from correctly assessing a treatment response visually. Indeed, a positive correlation between 18F-FDG uptake in normal tissues and the probability of a response to therapy has been reported (22). Accurate region-of-interest assignment is critical when the SUV is used to assess the response after radiotherapy because uptake in the uninvolved lung may be in the range considered to be malignant (SUV, >2.5). In a study evaluating secondary metabolic changes in the lung parenchyma of 101 patients undergoing radiotherapy for esophageal cancer, the irradiated lung demonstrated an average SUVmax of 4.96 (range, 1.84–18.2) (54).
The likelihood of a CMR appears to be even lower with neoadjuvant chemotherapy than with chemoradiotherapy regimens and, because an early result is often required to select patients to proceed to surgery, quantitative or semiquantitative methods have been advocated to evaluate various degrees of a PMR. Most studies involving serial SUV estimation of a response to induction chemotherapy have shown that the response on PET correlates with the pathologic response, survival, or both. However, the criteria used to define the response have varied from study to study. In 47 patients for whom resection was planned, all but 1 of whom had received chemotherapy, the metabolic response was strongly predictive of survival (47). Patients survived for more than 56 mo when the posttreatment SUVmax was less than 4 but only 19 mo when it was 4 or greater (P < 0.01). That study (47) also revealed a relationship between the posttreatment SUV of the primary lesion and the likelihood of residual disease on pathology. In patients with an incomplete metabolic response, there was an increasing incidence of unexpected metastatic disease. Of 9 patients who were found to have unexpected metastases, 6 were among the 9 patients with stable disease on CT and 3 were among the 35 patients with a partial response determined from WHO criteria. These data suggested that local control is important in preventing progressive metastatic disease.
A comparison of the rate of glucose metabolism, as determined from dynamic imaging of 18F-FDG uptake and a reduction in the SUV in 51 patients evaluated after 2 or 3 cycles of chemotherapy, used as part of induction therapy or as palliative treatment, revealed that with cutoff thresholds of 47% for the rate of glucose metabolism and 35% for a decrease in the SUV, significant prognostic stratification was obtained for overall survival (P = 0.017 and P = 0.018, respectively) and for progression-free survival (P = 0.002 and P = 0.0009, respectively) (55). As described earlier, a detailed study of serial 18F-FDG PET scans obtained early during chemotherapy for 16 patients demonstrated that changes might be observed in the SUV as early as 1 wk after the initiation of treatment (40). In that trial, a reduction in the SUV of greater than 50% between week 1 and week 3 predicted survival for more than 6 mo, whereas a reduction of less than 50% led to death within 6 mo. Similarly, in a larger study in Germany, 57 patients with stage IIIB and IV NSCLC were evaluated before and after the first cycle of chemotherapy (56). With a response criterion of a 20% reduction in the SUV, significantly longer progression-free survival (P = 0.0003) and overall survival (P = 0.005) were observed in metabolic responders than in nonresponders. A close correlation was observed between the metabolic response and the best RECIST-based response achieved (P < 0.0001).
The importance of a response in mediastinal lymph nodes was assessed in a prospective multicenter trial in Europe (57). In 47 patients who had baseline scans and further evaluation after 1 and 3 cycles of adjuvant therapy, the rate of glucose metabolism and semiquantitative measures of glucose use, as determined from 18F-FDG measures, were evaluated with predicted survival as the endpoint. Residual 18F-FDG uptake after treatment was the strongest prognostic factor. Patients showing a CMR in mediastinal nodes on 18F-FDG PET had longer survival than patients with residual N2 or N3 disease (P = 0.035). A decrease in 18F-FDG uptake of 30% or more discriminated a group of patients with superior survival. With a threshold rate of glucose metabolism of 0.13 μmol/mL/min after 3 cycles of chemotherapy, patients were strongly stratified for survival (P = 0.0003). The same threshold after a single cycle of chemotherapy also predicted survival (P = 0.007). Encouragingly for the logistics of therapeutic response assessment, that study (57) showed a good correlation between the rate of glucose metabolism estimated from dynamic 18F-FDG imaging and the SUV.
In the study of Dooms et al. (48), the interrelationship between a mediastinal nodal response and a metabolic response in predicting survival was analyzed in 30 patients receiving induction chemotherapy. With a percentage decrease in the SUVmax of 60% as a threshold combined with a major pathologic response in nodes, the 5-y overall survival rate was significantly higher in patients with a decrease in the SUVmax in the primary tumor of greater than 60% than in patients with a decrease in the SUVmax of less than 60% (62% vs. 13%; P = 0.002). The authors (48) concluded that surgical candidates should be selected from patients with mediastinal downstaging or persistent minor nodal disease, as indicated by mediastinoscopy. Patients with persistent major mediastinal nodal disease should not have surgery. Whether PET evaluation could have provided an adequate noninvasive evaluation of residual disease in the mediastinum was not addressed in that study (48). A recent study in Belgium involving 31 patients with locally advanced disease (stages IIIA and IIIB) found that patients showing a CMR had significantly longer median overall survival than patients not showing a CMR (>49 mo vs. 14.4 mo; P = 0.004) (58).
As with the data relating to the ability of 18F-FDG PET to predict a histopathologic response, there is, amid all this encouraging data, an important outlier that has brought into question the utility of metabolic response assessment. A recently reported study involving 89 patients from 2 consecutive phase II clinical trials of chemotherapy for NSCLC found that the response on 18F-FDG PET was unable to predict outcome, whereas RECIST were associated with overall survival (59). That study included patients at a less advanced stage than previous studies evaluating 18F-FDG PET for monitoring neoadjuvant chemotherapy. In that study, patients with a complete (n = 1) or partial (n = 32) response, as determined from RECIST, had not reached a median survival at 48 mo, whereas those with stable or progressive disease (n = 56) had a median survival of 36 mo (P = 0.04). With visual analysis of PET scans being carried out between 1 and 13 d after the last dose of chemotherapy and surgery taking place within a few weeks of that analysis, no significant difference in survival was apparent. An extremely unusual finding for any therapeutic monitoring trial comparing anatomic and metabolic responses was the presence of more anatomic responders than of metabolic responses (33 vs. 28). Moreover, review of the waterfall plot revealed virtually no relationship between morphologic response measures and the percentage reduction in SUVmax on PET, although there was reported to be a weak correlation between visual and semiquantitative responses. Again, there is usually a strong correlation between a metabolic response and an eventual radiologic response (56). However, a metabolic response tends to precede a RECIST-based response and is usually more marked because of some tumors healing by fibrosis rather than necrosis or apoptosis. SUV data were available for only 59 patients in that study (59). For either a 30% or a 50% reduction in the SUV of the lesion with the most intense uptake, no relationship with survival was observed. Survival was actually longer in patients without a response on PET than in those with a response on PET, although this finding was not statistically significant.
How can such a marked discordance with not only the significant number of therapeutic monitoring studies for NSCLC but also the literature comparing metabolic and morphologic responses across a broad range of malignancies be explained? Performing 18F-FDG PET too close to the last dose of chemotherapy would tend to overestimate a response, not underestimate it, as appears to have been the case in the study of Tanvetyanon et al. (59). Perhaps the use of only the lesion with the most intense uptake for analysis was the problem. In particular, the latter could significantly impair interpretation and assignment of appropriate regions of interest for semiquantitative analysis if inflammatory changes or reactive lymphadenopathy were not recognized by pattern recognition. The presence of incidental foci of abnormal 18F-FDG uptake is common but generally well recognized by experienced readers (60). A further factor in the study of Tanvetyanon et al. (59) may have been the relatively low disease burden, decreasing the importance of the control of disease beyond resection margins relative to the importance of the adequacy of resection for outcome. Indeed, a clinical report (61) for a subgroup of the patients involved found that the major prognostic factor was whether complete or incomplete resection was achieved (P = 0.004), whereas a pathologic response was not predictive of either overall survival (P = 0.25) or progression-free survival (P = 0.16).
CONCLUSION
The aggregated data on the use of 18F-FDG PET and PET/CT in therapeutic response assessment strongly indicate that a reduction in tissue 18F-FDG retention, however it is measured and at whatever time after treatment it is recorded, is more likely to be associated with both a pathologic response and improved survival (Table 1) than is a lack of change. The diversity of definitions of a metabolic response combined with the variability of acquisition and processing protocols is not surprising because PET has undergone rapid technologic evolution over the last 10 y. Nevertheless, the lack of reproducibility and standardization of the measures of response and the poor harmonization of response criteria are impediments to the qualification of 18F-FDG PET as a biomarker. Although many groups are working actively to achieve these objectives, there is a potential risk for the establishment of response criteria that may become inappropriate with continuing advancement of the technology. Accordingly, outside clinical trials, a metabolic response should be evaluated in the context of all available information, including the type and duration of treatment, the time elapsed between the last treatment and the PET evaluation, clinical evidence of toxicity and response, and the expected likelihood of a cure on the basis of risk factors and published response rates. The optimal timing of therapeutic response assessment is theoretically that which provides the most robust assessment of a response and the greatest opportunity to alter treatment in nonresponders. More robust response assessment will reduce the cost and morbidity associated with ineffective treatment and increase the opportunity for the institution of effective salvage treatment. Although expensive compared with CT, serial 18F-FDG PET could have practical value by facilitating response-adapted therapy and could actually reduce overall health care costs by leading to more cures and diminishing the use of and complications associated with ineffective treatment. For clinical trials, prospectively defined response criteria will be needed but should be rationally designed.
The studies detailed here evaluated conventional cytotoxic therapies, including chemotherapy and radiotherapy. Molecularly targeted therapies are now increasingly being used for cancer. Inhibitors of epithelial growth factor receptor signaling are already in widespread use. Preclinical studies, however, have suggested that metabolic responses may be profound (62). Clinical studies are in progress with these agents (Fig. 4). The response of NSCLC to these and other novel therapies may not change 18F-FDG kinetics either to the same degree or with the same temporal profile as more conventional therapies. Again, caution is recommended before response criteria that may not fulfill the longer-term needs of either patients or the community are established. Additionally, it must be recognized that PET is only one of several potential biomarkers of therapeutic response and prognosis. Improved molecular biology techniques will be important complementary tools and may be more sensitive than imaging in detecting residual disease, as demonstrated by recent studies of circulating cancer cell detection (63).
Footnotes
-
COPYRIGHT © 2009 by the Society of Nuclear Medicine, Inc.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- Received for publication December 24, 2008.
- Accepted for publication February 25, 2009.