Abstract
PET and PET/CT are widely used for surveillance of patients after cancer treatments. We conducted a systematic review to assess the diagnostic accuracy and clinical impact of PET and PET/CT used for surveillance in several cancers. Methods: We searched MEDLINE and Cochrane Library databases from 1996 to March 2012 for English-language studies of PET or PET/CT used for surveillance of patients with lymphoma, colorectal cancer, or head and neck cancer. We included prospective or retrospective studies that reported test accuracy and comparative studies that assessed clinical impact. Results: Twelve studies met our inclusion criteria: 6 lymphoma (n = 767 patients), 2 colorectal cancer (n = 96), and 4 head and neck cancer (n = 194). All studies lacked a uniform definition of surveillance and scan protocols. Half the studies were retrospective, and a third were rated as low quality. The majority reported sensitivities and specificities in the range of 90%–100%, although several studies reported lower results. The only randomized controlled trial, a colorectal cancer study with 65 patients in the surveillance arm, reported earlier detection of recurrences with PET and suggested improved clinical outcomes. Conclusion: There is insufficient evidence to draw conclusions on the clinical impact of PET or PET/CT surveillance for these cancers. The lack of standard definitions for surveillance, heterogeneous scanning protocols, and inconsistencies in reporting test accuracy preclude making an informed judgment on the value of PET for this potential indication.
PET using the glucose analog 18F-FDG has become an important modality for cancer imaging because of the characteristically increased use of glucose by malignant cells. Since its introduction in 2000, PET/CT has progressively replaced conventional PET, and nearly all scanners now in use worldwide are PET/CT scanners (1). Compared with conventional PET, PET/CT provides greater accuracy in localizing 18F-FDG uptake, with resultant improvement in observer performance (2,3). Hereafter in this article, the term PET will be used to refer to PET or PET/CT; distinctions will be made where needed.
PET is used for many cancers for diagnosis, initial staging, assessment of treatment response (4,5), restaging, detection of clinically suspected recurrence, and surveillance (6–9). Using advanced imaging, including PET, for posttreatment surveillance of patients is controversial and generally not recommended for most cancers (10,11). The widely held yet anecdotal impression is that surveillance PET imaging is common, yet there are few published estimates of utilization rates for this indication (12). The National Oncologic PET Registry does not specifically gather data on the use of PET for surveillance purposes (13). Although systematic reviews have been conducted for a range of PET uses, none have focused on the use of PET for surveillance (14,15).
A common conceptual framework for evaluating diagnostic test technologies categorizes studies into 6 assessment levels (16). In this systematic review, we searched for evidence to assess the diagnostic accuracy and clinical impact of surveillance PET (i.e., impact of scans on use of other diagnostic tests, impact on therapeutic decisions, and effect on patient outcomes). We focused a priori on lymphoma, colorectal cancer, and head and neck cancer, as these have the most studies and, in our experience, the largest numbers of patients undergoing posttreatment surveillance. We also gathered data from studies that did not meet the inclusion criteria to inform future research recommendations.
MATERIALS AND METHODS
In carrying out this systematic review, we adhered to the PRISMA statement (Preferred Reporting Items for Systematic reviews and Meta-Analyses) (17).
Literature Search Strategy
We searched the MEDLINE and Cochrane Central Register of Controlled Trials from 1996 to March 2012 for English-language studies examining the use of PET in lymphoma, colorectal cancer, and head and neck cancer. We looked for additional studies by reviewing the reference lists of studies that met our inclusion criteria and relevant Cochrane systematic reviews. A variety of keywords and Medical Subject Heading (MeSH) terms were used, including terms used to describe PET devices and terms related to surveillance (e.g., monitoring and follow-up).
Study Selection
The abstracts were reviewed for eligibility by 1 of 4 authors, with questionable studies being adjudicated by all authors. Surveillance imaging was defined as imaging performed at least 6 mo after completion of treatment with curative intent among patients who were considered to be disease-free by clinical examination or other imaging at the time of PET. We included reports evaluating patients with lymphoma, colorectal cancer, or head and neck cancer at any cancer stage before treatment. Studies were excluded if results were not separately reported for patients considered to be disease-free or if patients were suspected by any clinical signs or symptoms of having recurrent disease. Scans could be performed on a 1-time basis or a periodic schedule. Acceptable reference standards for recurrence included histology, other imaging modalities, laboratory tests, clinical examination, or some combination as defined by the study authors.
For studies of test accuracy, we included prospective or retrospective studies. We accepted studies that used either individual patients or individual scans as the unit of analysis and either reported test accuracy (e.g., sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio) or presented data in 2 × 2 tables that allowed calculation of test accuracy. For studies assessing clinical impact, we considered only comparative studies.
Data Extraction and Calculation of Test Accuracy
Data from each study were extracted by one of us and confirmed by another. Discrepancies were reconciled by three of us. Information was collected on cancer type, patient characteristics, details of the surveillance protocol, the reference standard used, and relevant measures for diagnostic accuracy and clinical impact outcomes. Although some studies performed surveillance scans at more than one time point, test accuracy metrics were typically not reported for all time points, and surveillance protocols often were unclear as to which patients were included in later scans. Thus for each study, we extracted data for the first time point at which surveillance scans occurred, at a minimum of 6 mo after treatment completion. Where possible, we computed the “yield” of screening, defined as the percentage of positive studies (true-positive plus false-positive) in the scanned population. When not provided by the study, test accuracy measures (sensitivity, specificity, positive and negative predictive values, and likelihood ratios) and confidence intervals were calculated using Stata, version 11.0 (StataCorp LP).
Study Quality Assessment
We extracted information on study design, conduct, and reporting and used the Quality Assessment of Diagnostic Accuracy Studies tool (18) to evaluate the quality of the studies assessing test accuracy. For comparative studies reporting on clinical impact outcomes, we combined this tool with selected items from the Cochrane Risk of Bias tool (19) that were applicable to diagnostic testing studies. The primary data extractor assessed the study quality, and another reviewer confirmed the quality grade.
We rated each study using an “A,” “B,” or “C” letter grade according to predefined criteria. Quality A studies adhered to recognized standards of conduct for diagnostic test studies and provided clear descriptions of the design, population, test, reference standard, and outcomes. These studies also had no major reporting omissions or errors and no obvious source of bias. Quality B studies had some deficiencies in these criteria, but these deficiencies were considered unlikely to result in a major bias (retrospective studies start with a lower grade of B). Quality C studies had serious design or reporting deficiencies.
Results are summarized by cancer type, and separately for PET and PET/CT. Although we reported information on quality C studies, we drew test accuracy conclusions only from quality A and B studies.
RESULTS
The literature search yielded 1,813 citations, from which 146 full-text articles were evaluated (Fig. 1). Twelve studies (7 PET, 5 PET/CT) met our inclusion criteria and provided test accuracy data. One randomized controlled trial provided data on impact on therapeutic decision making and clinical outcomes (20). Studies were most often excluded for failing to meet our definition of surveillance, most commonly for scanning less than 6 mo after completion of treatment or scanning in order to assess treatment response or for restaging.
Literature flow of evaluated abstracts, retrieved studies, and included studies.
Table 1 shows the characteristics of the included studies: 6 lymphoma studies (2 PET, 4 PET/CT), 2 PET studies in colorectal cancer, and 4 in head and neck cancer (3 PET, 1 PET/CT). All 5 PET/CT studies used CT for attenuation correction and localization of PET findings. One study used contrast-enhanced CT for diagnostic purposes (21).
Characteristics of Studies Evaluating Surveillance PET or PET/CT
There was no standard definition of surveillance across all studies or within cancer types, nor was there a consistent schedule for repeated scans. The duration between the final surveillance PET and the last clinical follow-up examination ranged from 2.3 to 31 mo. In 7 studies, patients were scanned serially, in 4 studies patients were scanned once, and the frequency was indeterminate in another. The reference standard used to verify PET results varied among studies and included CT alone, as well as a combination of laboratory and imaging findings and an absence of symptoms.
Although patients were deemed to be disease-free after treatment completion in all studies, 10 studies did not indicate the means by which disease status was confirmed. Patients in 2 studies were deemed disease-free by negative results on PET/CT done for restaging after treatment (22,23).
In colorectal cancer and head and neck cancer, all studies reported diagnostic accuracy using patients as the unit of analysis. Two lymphoma reports used scans as the unit of analysis (24,25). Information was unclear as to how sensitivity and specificity were calculated when a patient had conflicting scan results at 2 different time points (i.e., if a patient had a negative scan followed by a positive scan) (26).
Table 2 lists for each study our overall quality ratings and specific grading criteria. There were 1 quality A study, 8 B studies, and 3 C studies.
Quality of Surveillance PET and PET/CT Studies
Lymphoma
For lymphoma, there were 4 retrospective PET/CT studies and 2 prospective PET studies. Four were rated as quality B (22–26) and 2 as quality C (21,27). The 2 quality C studies listed in the tables are not included in the synthesis. Sample sizes of quality B studies ranged from 27 to 421 patients, with a total of 541 patients. Only 1 study analyzed children (24).
Table 3 shows diagnostic accuracy by cancer and imaging modality. The 3 quality B PET/CT studies included 120 patients, had a per-patient level sensitivity of 100%, and had specificities ranging from 43% to 92% (22–24). One PET study with 421 patients was rated quality B and reported a sensitivity of 89% and a specificity of 100% (26). Among the 4 lymphoma studies with sufficient data to calculate predictive values, positive predictive values ranged from 0.2 to 1.0 and negative predictive values ranged from 0.98 to 1.0; the yield of positive PET scans in these studies ranged from 9.6% to 63%.
Summary of Test Performance for Surveillance PET/CT and PET
Colorectal Cancer
Two PET studies evaluated patients with colorectal cancer. One was a randomized controlled trial of 130 patients (20) and the other a retrospective study with 31 patients (28). The randomized trial compared a surveillance strategy that included CT at 9 and 15 mo after surgery (n = 65) with a strategy that included both PET and CT scans (n = 65) at the same time points. This trial assessed impact on therapeutic decision making, impact on mortality, and test accuracy. The retrospective study was graded quality C because of likely selection bias.
The randomized trial ended recruitment early because of ethical and methodologic concerns when PET/CT scanning became available in 2004 at the institution. For clinical impact outcomes, the study was rated quality A. Using a per-protocol analysis with 60 patients in the PET group (5 fewer than in the intention-to-treat analysis because of missing data) and 65 in the control group, the study found that recurrences were detected sooner after baseline in patients in the PET group (12.1 ± 4.1 mo) than in patients in the control group (15.4 ± 6 mo; P = 0.01). Therapy was started sooner, but not significantly so, in the PET group (14.8 ± 4.1 vs. 17.5 ± 6.0 mo, P = 0.09). Surgery for recurrent disease was performed more frequently in the PET group (15 of 23 [65%] vs. 2 of 21 [9.5%], P < 0.0001). Moreover, the frequency of curative resection of recurrences was higher in the PET group (43.8% vs. 9.5%, P < 0.01). Intention-to-treat analyses gave similar results. Using a per-protocol analysis, the study also found that a nonsignificantly greater number of patients with recurrences died during the study period (with a maximum follow-up of 24 mo) in the control group than in the PET group (28.5% vs. 13%, P = 0.33). This study was rated quality B for assessment of test accuracy, with sensitivity of 100% and specificity of 96%. Yield could not be calculated.
Head and Neck Cancer
Patients with head and neck cancer were evaluated by PET/CT in 1 prospective study (25) and by PET in 2 prospective (29,30) and 1 retrospective (31) studies. The PET/CT study was rated quality A, and the PET studies were rated quality B. The PET studies had unclear reporting and possible selection bias.
The prospective PET/CT study enrolled 91 squamous cell carcinoma patients and reported a sensitivity of 100% and a specificity of 85%. The 3 PET studies comprised 103 patients and included 2 studies examining squamous cell carcinoma (30,31) and 1 including all cell types (29). Sensitivities ranged from 75% to 100%, and specificities ranged from 92% to 95%. The 4 head and neck cancer studies had positive predictive values between 0.5 and 0.9, a negative predictive value of 1.0, and a yield of positive PET scans ranging from 14% to 57%.
Additional Analysis of Studies Not Included in the Review
Less than 10% of retrieved full-text articles met our inclusion criteria. Table 4 summarizes selected characteristics leading to exclusion. Less than a quarter of lymphoma and colorectal cancer and roughly half of head and neck cancer studies had prospective designs. Less than 15% of lymphoma and head and neck cancer studies included patients that were considered to be disease-free at the time of imaging, and approximately a quarter of studies on these cancers described the scans as being for surveillance. In none of the colorectal cancer studies were patients verified to be disease-free, and in only one of these were the scans described as being for surveillance.
Characteristics of Studies Not Included After Full-Text Screening
Several studies met most of the inclusion criteria but failed to either adequately report the surveillance protocol or clearly describe the patient population. For example, one study described scans as being for the purpose of surveillance, but these were performed at a median of 12 wk after treatment completion (and thus would be more properly classified as restaging) (32). Another study performed scans at a median time of 6.6 mo after treatment completion, but the range was 1.6–166 mo and 28 of 35 scans were for suspected recurrence (33).
DISCUSSION
This systematic review of PET for posttreatment surveillance of patients with lymphoma, colorectal cancer, or head and neck cancer found only a single comparative study examining its impact on patient management and few studies that assessed test accuracy. The sole randomized trial suggests that PET may have an important clinical impact on therapeutic decision making and may improve patient outcomes when used for surveillance of colorectal cancer—one of the few cancers for which evidence exists supporting intensive posttreatment surveillance (34). Most trials reported sensitivity and specificity in the range of 90% and 100%, but some reported much lower values.
Because of the inconsistent definition of surveillance, the variations in imaging protocols, and the few studies using a particular imaging modality in a given cancer type, we did not conduct a metaanalysis. In addition, the literature was of inferior quality—7 of 12 studies used a retrospective design and half lacked masked outcome assessments. The retrospective studies had an inconsistent or no predefined scanning frequency and interval. Prospective studies used widely ranged scanning schedules—from multiple scans every 6 mo to a single scan at roughly 2 y after treatment completion.
Our finding of a lack of evidence supporting PET/CT in posttreatment surveillance is reflected in practice guidelines (10,11). Current National Comprehensive Cancer Network guidelines do not recommend surveillance. For head and neck cancer, PET is recommended for restaging in patients with higher-stage disease (III and IV) but not thereafter. Similarly, PET is now the standard of care for end-of-treatment response assessment in Hodgkin lymphoma and aggressive non-Hodgkin lymphoma but not for surveillance. The Hodgkin lymphoma guideline explicitly states that surveillance PET should not be done because of the risk of false-positives, nor is PET recommended in the non-Hodgkin lymphoma guideline. Nevertheless, PET/CT is commonly used for surveillance (35). Possible risks of using PET/CT for surveillance include overtreatment based on false-positives and unnecessary radiation exposure (36).
Our review highlights the 2 failures that dominate problems with surveillance. The first is the lack of a common definition for surveillance (the minimal time since last treatment and the absence of clinical or other diagnostic suspicion of recurrence), and the second is the lack of a well-thought-out prospective protocol based on cancer type and stage at last treatment. Testing intervals should be tailored to the relative risk of recurrence, which has been shown in each of these cancers to have its own declining pattern with time.
Few studies met the criteria of our review. However, some studies that were excluded from the review may have included patients who had surveillance scans. Because of the limited amount of evidence on surveillance scanning, we collected data from rejected studies to better understand characteristics of studies that may still fall under the definition of surveillance scanning. We found that most of these studies did not include patients who were disease-free at the time of the scan, and most did not clearly report the details of the scanning protocol.
Our review had several limitations. Results were difficult to synthesize because of the lack of a standard surveillance definition. Studies were generally of poor quality, with more than half being retrospective. In studies conducting multiple scans as part of a surveillance protocol, we were unable to use all available data because test accuracy was not consistently defined and reporting was incomplete. One study on head and neck cancer included in this review examined the hypothetical therapeutic impact of PET surveillance, but this outcome was not included in our results because of the lack of a comparison group (30). In addition, our review included 2 generations of PET technologies—PET alone and PET/CT. There is substantial evidence across many cancers and indications that PET alone is more sensitive and specific than conventional imaging methods. Thus, even though PET/CT results are usually more accurate than results with PET alone, results from PET-only studies set a baseline of performance that is likely only to be improved by PET/CT. Finally, we did not assess publication bias. Although there is always a concern in systematic reviews that unpublished negative studies may negate the positive results, the paucity of evidence in favor of using surveillance PET lessens this concern. There is also a lack of reliable methods to assess publication bias (37).
Future research should provide detailed descriptions of the surveillance protocols and patient populations; the need for such details has been suggested in previous systematic reviews of colorectal cancer surveillance (34,38). Better-conducted studies will help to answer the questions of which patients would be helped most by surveillance (e.g., patients having different disease severities) and which surveillance protocols are most effective for different cancers. Because of the few randomized trials conducted in this area, it is even more important for prospective trials to clearly report protocols and patient populations in order for the efficacy of PET/CT surveillance strategies to be understood. Retrospective studies can help answer the question of PET/CT surveillance test accuracy, but prospective studies are needed to address aspects of clinical impact. As suggested by Baca et al. (38), adapting the parameters of surveillance protocols (such as frequency and duration of surveillance) to patient risk levels is an intriguing study design that would allow a more targeted approach to surveillance. Different cancers with different natural histories may dictate variable surveillance durations, as the benefits and risks of follow-up vary over time (38).
The results of this review point to the need to establish common definitions of surveillance and surveillance protocols. Broadly, surveillance can be defined as evaluation of an asymptomatic patient with no clinical evidence of disease to assess for otherwise occult disease. In addition to a need for improved reporting, there is a need for comparative studies of surveillance that are powered to look at clinically relevant outcomes. Future high-quality prospective studies, including randomized trials, are necessary to answer the question of what role surveillance scanning should play and for what duration in different cancers.
DISCLOSURE
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734. This work was supported in part by a National Cancer Institute Grand Opportunity Grant, RC2CA148259. Barry A. Siegel is on the advisory boards of Radiology Corporation of America, Siemens Molecular Imaging, and GE Healthcare and is a stockholder of Radiology Corporation of America. No other potential conflict of interest relevant to this article was reported.
Footnotes
Published online Jun. 17, 2013.
- © 2013 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
REFERENCES
- Received for publication January 9, 2013.
- Accepted for publication March 18, 2013.