|
|
|||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Clinical Investigation |
1 Department of Radiology, Indiana University School of Medicine, Indianapolis, Indiana; 2 Department of Ophthalmology and Visual Sciences, Washington University School of Medicine, St. Louis, Missouri; 3 Department of Veterans Affairs Palo Alto Health Care System and the Department of Medicine, Stanford School of Medicine, Stanford, California; 4 Department of Radiology, Emory University School of Medicine, Atlanta, Georgia; 5 Department of Radiology, Duke University School of Medicine, Durham, North Carolina; 6 Department of Radiology, Mayo Clinic, Rochester, Minnesota; 7 Department of Radiology, University of Wisconsin School of Medicine, Madison, Wisconsin; 8 Department of Veterans Affairs Palo Alto Health Care System and the Department of Radiology, Stanford School of Medicine, Stanford, California; 9 Department of Medicine, University of Wisconsin School of Medicine, Madison, Wisconsin; and 10 Department of Veterans Affairs Cooperative Studies Program Coordinating Center, Palo Alto, California
Correspondence: For correspondence or reprints contact: James W. Fletcher, MD, Department of Radiology, Indiana/Purdue University, Indiana University School of Medicine, University Hospital, Room 0655, 550 N. University Blvd., Indianapolis, IN 46202-5253. E-mail: jwfletch{at}iupui.edu
| ABSTRACT |
|---|
|
|
|---|
Key Words: oncology PET respiratory CT SPN diagnosis
| INTRODUCTION |
|---|
|
|
|---|
Diagnosis of lung cancer often begins with identification of a suggestive nodule on chest radiography or CT. CT is considered an excellent tool for detection and localization but has been shown to have poor specificity (58%) (2) for characterization of the nodule. PET with 18F-FDG has been shown in several studies to be a promising adjunct modality (3–7). A recent systematic review reported a pooled sensitivity and specificity of 94.2% and 83.3% (8). However, the authors of this report noted that the component studies were limited by small sample size, incomplete masking, and biased patient selection.
The primary objective of this study was to compare the accuracy of PET and CT in the characterization of pulmonary nodules in a head-to-head prospective study that addressed the methodologic limitations of previous studies.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The primary entrance criterion for participants was evidence of a new, untreated, solitary pulmonary nodule (SPN) between 7 and 30 mm in size on a posteroanterior and lateral view of plain chest radiography. All SPNs were round or oval, without associated atelectasis or pleural involvement. For eccentric nodules, size was based on mean diameter (adding the major and minor axes and dividing by 2). However, neither axis was less than 7 mm.
Exclusion criteria included an age of less than 21 y; pregnancy or lactation; a weight of greater than 350–400 lb; intercurrent pulmonary infection; thoracic surgery in the past 6 mo; radiotherapy to the chest in the past year; refusal to undergo biopsy, surgery, or a 2-y clinical follow-up; a life expectancy of less than 2 y if biopsy or surgery was not expected; and involvement in other Veterans Administration cooperative study projects. The SPN identified was referred to as the target nodule. Patients with nodules that were heavily calcified (high likelihood of being benign) or that fell outside the size criterion were not enrolled. Nodules smaller than 7 mm are poorly characterized by PET, and nodules larger than 30 mm are considered to be malignant until proved otherwise.
A composite reference standard was used (10). A malignant diagnosis was established by biopsy or surgical resection of the target nodule. A rating of "definitely malignant" or "definitely benign" by the local pathologist, masked to the CT and PET results, was the reference standard.
When biopsy or surgery was not performed, participants were followed for 2 y, undergoing chest radiography or CT every 6 mo. If the SPN was stable during this period, it was considered to be benign. For this purpose, "stable" was defined as a change in size of less than 1 mm, a decrease in size without treatment at 24 mo, or resolution of the nodule without treatment (11).
All participants underwent CT of the chest and 18F-FDG PET. All centers followed a study protocol for acquisition and processing of the image data for both PET and CT (9). Image quality and resolution of the PET cameras was confirmed by use of a specially fabricated imaging phantom circulated to all study locations. CT examinations were conducted according to guidelines from the American College of Radiology (12). Further details of our imaging protocols can be found in the supplemental data (supplemental materials are available online only at http://jnm.snmjournals.org). A panel of independent research readers established the index test result. The panel consisted of 3 recognized experts in CT and 3 recognized experts in PET. All were academically based physicians, were board certified in their specialty, and had published widely in their field. None was involved in the care of SNAP participants, and all were unaware of patient history other than information provided in compliance with the study protocol (i.e., the chest radiograph showing the target nodule, and the age, sex, smoking history, and tuberculosis status of the patient). The panel convened at a central location 7 times from February 2000 to July 2003. Multiple reader sessions were arranged to minimize fatigue by limiting the number of studies reviewed to no more than 100 per reader. Each participant image was reviewed twice by a reader of each modality (i.e., 2 PET readers interpreted each participant PET image; and 2 CT readers interpreted each participant CT image). PET and CT readers were asked to use a 5-point ordinal scale for ratings: definitely benign, probably benign, indeterminate, probably malignant, and definitely malignant. The PET and CT readers on the panel used the criteria shown in Table 1 to make these ratings. Disagreements between readers were not resolved; instead, the decision of one of the readers was selected at random for each participant. We took this approach to better reflect the method of interpretation seen in the clinical setting.
|
The sample size for SNAP was based on the primary comparison of the sensitivity of CT versus PET, adjusting for the correlation that resulted from performance of both tests on the same participant. Preliminary studies suggested an SE of 0.94, a 22% prevalence of malignancy, and a correlation of less than 0.30. Based on these and an expected difference of 7% in the sensitivity rates, the original sample size of 900 was chosen (90% power and an
of 0.05). However, at the request of the Data and Safety Monitoring Board, an interim analysis to check the assumptions was conducted and found a higher-than-expected correlation and prevalence of malignancy. As a consequence, the Data and Safety Monitoring Board reduced the study sample size from 900 to 400. Estimates of diagnostic accuracy for PET and CT were calculated for those participants for whom there was a CT and PET reading and a valid reference standard. For each participant, a reader pair (PET and CT) and reference test result were selected at random and used to estimate sensitivity, specificity, and the receiver operating characteristic (ROC) curve; further detail on this method can be found in the previously reported study design paper (9). Sensitivity and specificity were estimated for each level of diagnostic confidence (definitely benign, probably benign, indeterminate, probably malignant, and definitely malignant). Confidence intervals (CIs) around the estimates of sensitivity and specificity were estimated (13). The ratings were dichotomized, with definitely benign and probably benign considered negative, and indeterminate, probably malignant, and definitely malignant considered positive. Interval likelihood ratios (LRs) (14,15) were calculated (16) for each level of confidence and for these dichotomized sets. ROC curves, areas under the curve (AUCs), and 95% CIs were derived for each modality with RockIt 0.9B (17) as described by Dorfman and Alf (18,19). AUCs for the 2 modalities were compared accounting for correlation between test results because each participant acted as his own control (20).
The potential for bias in the estimate of sensitivity and specificity was evaluated using sensitivity analyses. In brief, when no reference standard had been identified for a study participant, the participant's study records and medical records that were available were reviewed by a panel that included the study cochairs and 3 other study investigators. This panel established a reference standard rating for the participant on a 5-point ordinal scale (definitely benign to definitely malignant), and sensitivity and specificity were reestimated in a systematic manner (the supplemental data provide a complete description of this process).
Inter- and intrareader reliability was estimated using a weighted
-statistic (21). Statistical analyses were performed using SAS 9.0 (SAS Institute).
| RESULTS |
|---|
|
|
|---|
|
The characteristics of the nodules are detailed in Table 2. Of the 184 malignant nodules, 35% were adenocarcinoma, 30% were squamous cell carcinoma, and 20% were other non–small cell lung cancer. Nodules representing metastatic disease to the lung represented less than 10% of the total, and less than 2% of the nodules were bronchoalveolar carcinoma. A definitively benign condition was known for 45 of the 160 benign nodules; the remainder was pathologically classified as other. The mean size for all nodules was 16.4 mm. Malignant nodules had a mean size of 18.9 mm (SD, 6.8); benign nodules, 13.3 mm (SD, 5.2). The predominant location of the nodules was the upper lung zones, with more nodules found in the right lung than the left. Nodules in the left lung were more likely to be malignant than those on the right, but this difference was not statistically significant.
|
|
A malignant final diagnosis was approximately 10 times more likely than a benign final diagnosis in participants with definitely malignant ratings on PET, but a malignant final diagnosis was only 3 times more likely when PET results were probably malignant (Table 3). The likelihood of malignancy was similar in participants with probably benign and definitely benign findings on PET. LRs for probably or definitely benign results on CT compared favorably with the corresponding LRs for PET.
The area under the curve for PET was 0.93 (95% CI, 0.90–0.95), and for CT it was 0.82 (95% CI, 0.77–0.86) (Fig. 2). The difference between these 2 estimates was statistically significant (P < 0.0001).
|
Sensitivity analyses showed little change in the estimates of diagnostic accuracy, indicating that our findings were robust to participant losses that were due to incomplete or inconclusive reference standard results (supplemental data).
Inter- and intrareader agreement for PET was excellent, with weighted
-statistics of 0.826 (95% CI, 0.782–0.870) for interreader and 0.924 (95% CI, 0.901–0.946) for intrareader comparisons. Agreement within and between CT readers was good but considerably lower (interreader: 0.637; 95% CI, 0.542–0.731; intrareader: 0.759; 95% CI, 0.660–0.859).
| DISCUSSION |
|---|
|
|
|---|
We found that PET had similar sensitivity and superior specificity to CT in the characterization of SPNs. Accordingly, LRs were similar for PET and CT results that were probably or definitely benign, and such results on either test were strongly associated with a benign final diagnosis. However, definitely malignant results on PET were much more predictive of malignancy than were these results on CT. ROC curve analysis confirmed that PET is more accurate than CT. Figure 2 shows that all points for the PET curve lie outside the curve for CT, indicating that regardless of where the threshold that defines a positive test result is set, the accuracy of PET is superior.
PET is typically used as an adjunct to CT in the evaluation of suggestive nodules (26). Our findings support this approach. We found that PET correctly classified 58% of the benign nodules that had been incorrectly classified as malignant on CT. In addition, 25% of nodules were characterized as indeterminate by CT readers, whereas only 1% of nodules were classified as indeterminate by PET readers. Nodules that were classified as indeterminate on CT were correctly characterized on PET in over 80% of the cases (sensitivity, 83%; specificity, 89%).
Gould et al. (8) reported a sensitivity of 94.2% and specificity of 83.3% in their metaanalysis that included a total of 450 patients with lung nodules in 13 small studies. Here, we found lower sensitivity and similar specificity, which would be consistent with those expected in a study with less verification bias (27). These findings are also likely a result of the method used to assemble our study sample. The SNAP protocol called for enrollment of participants with suggestive lesions seen on chest radiographs, as opposed to the strategy used in other studies, which enrolled participants in whom nodules were first seen on CT. We believed that enrolling participants with nodules first seen on CT would have biased the study by increasing the prevalence of malignant nodules. Our enrollment protocol led to a much lower prevalence of malignant nodules (53%) than is typically seen in studies of SPN characterization. This feature of our study design also allowed for an unbiased comparison of the accuracy of PET and CT.
As reported here, CT was not as accurate (sensitivity, 95.6%; specificity, 40.6%) as reported in a recent multisite study of contrast-enhanced CT (28) (sensitivity, 98%; specificity, 58%). The prevalence of malignancy was similar in both studies (53% in SNAP vs. 48% in the enhanced CT study), as were the distribution and mean nodule size. Therefore, it is likely that the difference seen was due to the use of a rigorous protocol with dynamic intravenous contrast enhancement, as opposed to the protocol used in SNAP, which was comparable to that seen in usual clinical practice. It is notable, however, that even with the performance seen in the contrast study, the specificity of CT remained below that seen for PET in SNAP.
We found that PET had superior interobserver and intraobserver reliability, compared with CT. In addition to the superior performance in characterizing indeterminate lesions, this reliability contributes to the superior accuracy of PET over CT. For characterization of solitary nodules, the interobserver reliability reported here for PET is similar to that previously reported in a smaller series (29), whereas for CT it is better than previously reported (30).
The results of this study may not be fully generalizable, inasmuch as the population was male with a high percentage of smokers. However, the prevalence of malignancy and the mean nodule size were similar to those of another large multicenter study with a mixed-sex population (25). The study design incorporated here, which relied on random selection of a single reader from a pair to estimate sensitivity and specificity, is somewhat novel. Although this design has undergone preliminary peer review in our study design paper (9), it should not be considered to have been fully tested by the rigors of peer review. Our analyses have not shown any bias that might be introduced by relying on this method, but readers considering incorporating such a design in their own investigations should be cautioned to await more extensive tests of the limitations and bias of this trial design.
The estimates of accuracy in this study were based on 344 participants who underwent both PET and CT and for a whom diagnosis was obtained on the basis of tissue or follow-up. There were 128 additional participants (27% of the total sample of 472) for whom no gold standard was obtained. This limitation may restrict the validity of our results. However, as shown in Supplemental Table 1, no differences were seen between the characteristics of those excluded from the analysis and the characteristics of those who were included. We also conducted a sensitivity analysis that demonstrated that using a less rigorous reference standard did not change the estimates of sensitivity and specificity.
Some might argue that with the advent of integrated PET/CT scanners, our comparison of dedicated PET and CT technology is dated. There is some evidence that integrated PET/CT is more accurate than dedicated PET for lung cancer staging (31,32). In this regard, and in uncommon circumstances, the CT component might help to improve the performance of PET by identifying rapid growth consistent with an infectious process, providing alternate diagnostic hypotheses, demonstrating a typical pattern of bronchioloalveolar carcinoma, and similar advantages.
| CONCLUSION |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
| References |
|---|
|
|
|---|
Related articles in JNM:
This article has been cited by other articles:
![]() |
P. G. Barnett, L. Ananth, M. K. Gould, and for the Veterans Affairs Positron Emission Tomogra Cost and Outcomes of Patients With Solitary Pulmonary Nodules Managed With PET Scans Chest, January 1, 2010; 137(1): 53 - 59. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Fisher PET for the Evaluation of Solitary Pulmonary Nodules J. Nucl. Med., February 1, 2009; 50(2): 326 - 326. [Full Text] [PDF] |
||||
![]() |
J. W. Fletcher and for the VA SNAP Cooperative Studies Group Reply: PET for the Evaluation of Solitary Pulmonary Nodules J. Nucl. Med., February 1, 2009; 50(2): 326 - 327. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | RSS | TABLE OF CONTENTS |
| JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY | THE JOURNAL OF NUCLEAR MEDICINE |