Abstract
Previously, we showed that a CT window and level setting of 1,600 and –300 Hounsfield units, respectively, and autocontouring using an 18F-FDG PET 50% intensity level correlated best with pathologic results. The aim of this study was to compare this autocontouring with manual contouring, to determine which method is better. Methods: Seventeen patients with non–small cell lung cancer underwent 18F-FDG PET/CT before surgery. The maximum diameter on pathologic examination was determined. Seven sets of gross tumor volumes (GTVs) were defined. The first set (GTVCT) was contoured manually using only CT information. The second set (GTVAuto) was autocontoured using a 50% intensity level for 18F-FDG PET images. The third set (GTVManual) was manually contoured using a visual method on PET images. The other 4 sets combined CT and 18F-FDG PET images fused to one another to become composite volumes: GTVCT+Auto, GTVCT+Manual, GTVCT−Auto, and GTVCT–Manual. To quantitate the degree to which CT and 18F-FDG PET defined the same region of interest, a matching index was calculated for each case. The maximum diameter of GTV was compared with the maximum diameter on pathologic examination. Results: The median GTVCT, GTVAuto, GTVManual, GTVCT+Auto, GTVCT+Manual, GTVCT–Auto, and GTVCT–Manual were 6.96, 2.42, 4.37, 7.46, 10.17, 2.21, and 3.38 cm3, respectively. The median matching indexes of GTVCT versus GTVCT+Auto, GTVAuto versus GTVCT+Auto, GTVCT versus GTVCT+Manual, and GTVManual versus GTVCT+Manual were 0.86, 0.65, 0.88, and 0.81, respectively. Compared with the maximum diameter on pathologic examination, the correlations of GTVCT, GTVAuto, GTVManual, GTVCT+Auto, and GTVCT+Manual were 0.87, 0.83, 0.93, 0.86, and 0.94, respectively. Conclusion: The matching index was higher for manual contouring than for autocontouring using a 50% intensity level on 18F-FDG PET images. When using a 50% intensity level to contour the target of non–small cell lung cancer, one should also consider using manual contouring of 18F-FDG PET to check for any missed disease.
It is well established that 18F-FDG PET plays an important role in the staging of non–small cell lung cancer (NSCLC). Multiple studies have demonstrated the utility of PET for improving staging accuracy. In an overview of the available literature, 18F-FDG PET was found to have a 79%–100% sensitivity and a 40%–90% specificity in diagnosing primary lung cancer (1).
The conventional imaging modality for therapy planning in NSCLC is CT. Targeting of the gross tumor has been facilitated by the use of CT simulation, which allows for more accurate delineation of the tumor. Multimodality imaging combining functional and anatomic information such as PET has allowed for further refinement in the therapy planning process and has a significant impact on the planning target volume. However, much uncertainty exists regarding the most appropriate threshold cutoff that should be used to define a PET target volume in NSCLC therapy planning. There are sparse data on gross tumor volume (GTV) contoured differences using different PET/CT thresholds correlated with gross tumor size on pathologic examination. Different institutions use varying methods to define the PET volume; these include the halo phenomenon, the absolute standardized uptake value (SUV) (2), a regressive SUV function threshold (3), a percentage of the maximum SUV intensity levels (4–7), an identification of affected anatomic structures by 18F-FDG (8), and a simply visual evaluation of PET images (9–12). These methods have demonstrated huge alterations in the target volume between CT-based therapy planning and PET/CT-based therapy planning. Because of a lack of uniformity in defining the PET tumor contours in the published literature, interpretation of the available data is difficult and leaves clinicians uncertain as to how they should incorporate PET into the therapy planning process. Because of possible incomplete tumor coverage, the 40%-of-maximum-SUV concept did not appear generally suitable for target volume delineation, and the contrast-oriented methods for contour definition showed more satisfactory results (13). Our previous study (14) examined the impact of varying the CT window and level parameters and the PET intensity thresholds on the radiologic tumor volumes as compared with the measured diameter on pathologic examination. Our results showed that setting a 50% intensity level on PET and a CT window of 1,600 Hounsfield units (HU) and level of −300 HU correlated best with the maximum diameter of the primary tumor as measured on pathologic examination.
The aim of this study was to investigate the difference in GTV autocontouring using a PET 50% intensity level and using manual contouring by a visual method, compared with pathologic results, and to determine which is the better method for target delineation on PET/CT images.
MATERIALS AND METHODS
Characteristics of Patients and Tumors
This study was approved by our local institutional research ethics board. Each patient was required to have pathologic confirmation of NSCLC. Seventeen patients with surgically resectable NSCLC underwent PET/CT before surgery between December 2004 and May 2007. The characteristics of the patients and primary tumors are shown in Table 1.
PET/CT Acquisition and Image Registration
Patients were asked to consume a high-protein, low-carbohydrate diet (to reduce myocardial uptake of 18F-FDG) and to avoid vigorous exercise in the 24 h preceding imaging. Patients fasted at least 6 h before the intravenous injection of 18F-FDG. Blood glucose levels were checked and recorded. The amount of 18F-FDG injected was 185–370 MBq, depending on patient weight. Patients rested for approximately 1 h before imaging. Free-breathing PET and CT images were acquired using a Gemini PET/CT scanner (Philips) or a Discovery PET/CT scanner (GE Healthcare). The interval between 18F-FDG injection and the start of the PET acquisition was 40–60 min. First, a topogram was made from the skull to the mid thigh. Second, CT images (3-mm slices) at an interval of 3 mm typically were obtained from the base of the skull through the proximal thighs without the administration of either oral or intravenous contrast agents.
PET data were acquired using an acquisition time of 3 min per table position with a 50% overlap. The data were reconstructed using a 3-dimensional row-action maximum-likelihood algorithm for the Gemini and ordered-subsets expectation maximization for the Discovery and corrected for attenuation using a CT-derived transmission map. The voxel size after reconstruction was 4.0 × 4.0 × 4.0 mm.
Once the PET and CT images were acquired, the image datasets were transferred to a Pinnacle therapy planning workstation (version 8.0, Philips) for image coregistration.
GTV Definition and Delineation
Target volume was defined according to the prepared protocol before the study. Briefly, gross macroscopic tumor visible on the CT or PET scans defined the GTV. Only primary lung tumors were contoured. Guidelines were established for contouring GTVs. A standardized window–level setting was used to manually contour the GTV on PET images.
Seven sets of GTVs were defined by a physician observer (Table 2; Fig. 1), with agreement by a second physician observer. GTVCT was delineated manually using only CT information. GTVAuto was autocontoured using a 50% intensity level on PET images. GTVManual was manually contoured using a visual method only on PET images. The other 4 GTVs combined CT and PET images fused to one another to become composite volumes: GTVCT+Auto, GTVCT+Manual, GTVCT–Auto, and GTVCT–Manual, where plus (+) means the volume must include both GTVCT and GTV Auto or GTV Manual and minus (−) means the volume includes the intersection of both GTVCT and GTVAuto or GTV Manual (Figs. 1C and 1D). This volume was a composite delineated by initially creating a union of the CT and PET regions of interest. For example, when GTVCT+Manual was contoured, the first GTV contour was obtained using CT information only, and then, using a visual method, another GTV contour was obtained using PET information. Finally, the last manual contour was a composite GTVCT+Manual using PET/CT fusion. When the GTV was contoured using the PET image, the maximum intensity was measured within the primary tumor using voxel values. When the GTVAuto was contoured, the planning software automatically localized the PET-imaged object using a threshold value of 50% intensity relative to the maximum intensity within the primary tumor on each PET transaxial image. Using PET, GTVManual was defined as the PET-visualized enhancing gross tumor (2), with the GTV edge positioned at the maximum local gradient magnitude distinguishable by the observer.
GTVCT was manually delineated using a lung window of 1,600 HU and a level of –300 HU on each transaxial image without knowledge of the PET results. GTVCT always was contoured first. When the observer was drawing the GTVCT contours, the PET image was masked. All areas of gross primary lung tumor were contoured using lung window settings for the interface between tumor and lung and using modified mediastinal window settings (window, 600 HU; level, 40 HU) for the interface if the tumor was close to the mediastinum.
The maximum diameter of GTV was measured on each PET or CT slice in transverse, anteroposterior, and craniocaudal directions. The greatest of these diameters was used for comparison with the maximum diameter on pathologic examination. The total GTV contoured on the CT or PET image or composite GTV was measured and calculated for each patient in the 3-dimensional reconstruction images by planning software.
GTV Comparison
The GTVs as defined by the CT scans alone, by the PET scans using autocontouring or manual contouring, and by the fused PET/CT scans were compared. To quantitate the degree to which CT and PET (autocontouring or manual contouring) defined the same region of interest, we calculated a matching index using previously described methods. In brief, a matching index was calculated for each case using the following formula: {[(a − a/b) + (b − b/a)]/a + b} (15).
Pathology Procedure
All 17 patients underwent surgical resection of their lung tumor. Immediately after surgery, the involved lung lobes were inflated and fixed for 12–24 h in 10% neutral buffered formalin. Specimens were then serially sectioned at 3- to 5-mm intervals, and the maximum diameter of the primary tumor was measured by macroscopic examination in 3 dimensions. Sections for histologic examination of the tumor and lymph nodes were taken and processed for paraffin embedding and sectioning using standard protocols.
Statistical Analysis
GraphPad Prism statistical analysis software was used. One-way ANOVA was used to determine differences among groups. Concerning the delineations, the different volumes were measured for each case. Results are summarized by the mean, median, and range. The mean difference among the GTVs was calculated for tumors at different contouring methods. Linear and logarithmic regression analyses were used to determine the relationship between the maximum diameter of GTV and the pathologic maximum diameter. The Pearson correlation coefficient was used to compare pathologic and imaging estimates of maximum tumor diameter. Two-tailed P values are provided, and P values of less than 0.05 were considered statistically significant.
RESULTS
GTVs
The GTVs are shown in Table 3. GTVCT was larger than GTVAuto in 88% of cases (15/17), GTVAuto was larger than GTVCT in 12% (2/17), GTVCT was larger than GTVManual in 76% (13/17), and GTVManual was larger than GTVCT in 24% (4/17).
Autocontouring and Manual Contouring Matching Index in Composite Volume
The matching index of GTVCT, GTVAuto, and GTVManual versus composite GTVs are shown in Table 4. The median matching index of GTVCT versus GTVCT+Auto, GTVAuto versus GTVCT+Auto, GTVCT versus GTVCT+Manual, and GTVManual versus GTVCT+Manual was 0.86, 0.65, 0.88, and 0.81, respectively. The matching index was higher with PET manual contouring than with PET autocontouring. However, there was no significant difference between groups except for GTVCT versus GTVCT+Auto compared with GTVCT versus GTVCT–Auto.
Radiologic–Pathologic Correlation
A strong correlation was found between maximum diameter on pathologic examination and maximum diameter of GTVs (Pearson correlation coefficient, Table 5). The best correlation with pathology was found with GTVCT+Manual and GTVManual. GTVCT+Manual correlated better with pathology than did GTVCT+Auto (Fig. 2). Also, GTVManual correlated better with pathology than did GTVAuto.
DISCUSSION
PET is a significant advance in cancer imaging with great potential for optimizing radiation therapy planning (16).The delineation of target volume is a critical step in high-precision radiation therapy planning (17). Both good image quality and good delineation protocols are crucial for target volume definition (18). In this study, we evaluated the degree to which PET autocontouring and manual contouring defined the same volume of interest in NSCLC.
It is clear that with PET-defined tumor volumes in radiation therapy planning, variations in setting the image signal thresholds can significantly affect the contour of the GTV. Various methods are currently used to determine the outline of 18F-FDG–positive tissue. There is no validated standardized method for setting this threshold. The simplest method, which is widely used, is visual interpretation of the PET scan and definition of contours as judged visually in cooperation with an experienced nuclear medicine physician (9,19,20). Another method is the SUV, using percentages of the maximum SUV and regression function or source-to-background ratio (3). Published methods based on a threshold determined as a percentage of the maximum SUV (percentage threshold) have used values ranging from 15% to 50% (2,4,8,9,14,19,21–25). The reported variability of threshold values for lung lesions of different volumes indicates that there is no standard value applicable to all patients and that techniques for setting individual thresholds need to be defined and standardized. Although many have used a percentage of the maximum SUV intensity to define a tumor on PET, some would suggest that this fixed-threshold intensity is inadequate for target volume definition and tends to underestimate target volumes (3). Despite these views, we believe that there are several advantages to using a fixed-threshold intensity, as shown in our previous study (14) in which only 70% of patients could be contoured using a 50% intensity level.
The gold standard for validating a threshold technique for tumor definition would be a comparison with histologic specimens. This poses a particular problem in lung cancer, in which accurate spatial correlation of excised surgical specimens with imaging is difficult to achieve. As the use of PET increases in controlled studies reporting tumor outcomes or correlating PET volumes with pathologic data, we may have enough information to develop a more unified definition for PET volume contouring. Our previous study compared the maximum diameter of the tumor on pathologic examination with the maximum diameter of the GTV in 3-dimensional contouring on PET or CT images. Our results show that the 50% intensity level and a CT window of 1,600 HU and level of –300 HU correlated best to the actual geometry (14). In the current study, the matching index was higher with PET manual contouring than with PET autocontouring. However, we only correlated the measured maximum diameter on pathology.
Many factors affect SUV measurements and, therefore, tumor contours (e.g., the metabolic activity of a tumor, heterogeneity within a tumor, and tumor motion), thus making SUV-based contouring methods unreliable. Phantom studies (10) can help define thresholds for percentage maximum uptake in defining the tumor edge but have the limitation of not being able to adequately simulate the effect of background uptake as seen clinically (8,21,23,25,26). 18F-FDG PET–pathology correlation could be useful for building on existing data (27). Such studies could help characterize tumor boundaries, assist in image segmentation, and aid in the understanding of motion and apparent tumor volume. It is important to remember that a PET image is typically an average of multiple respiratory cycles and that a standard free-breathing CT image is simply a snapshot in time. Investigations from Memorial Sloan-Kettering Cancer Center have found that the variation in maximum SUV may be as great as 24% between end inspiration and end expiration or significant tumor size reductions (28). It is possible that PET tumor-volume contouring methods that use fixed-intensity thresholds are therefore more susceptible to motion error. Inaccurate PET/CT fusion methods could also be problematic. Capitalizing on digital whole-mount histopathologic methods developed by Clarke et al. (29), the aim is to generate 3-dimensional pathologic reconstructions and then coregister and compare these with 18F-FDG PET/CT volumes. Although significant early challenges have been encountered in working with lung tissue, such studies are in their infancy. Stroom et al. (30) have also tried this work in NSCLC. They investigated the feasibility of pathology-correlated imaging for lung tumors, taking into account lung deformations after surgery. Their results have shown that pathology-correlated lung imaging is feasible and can be used to improve target definition. Geets et al. (31) have addressed similar questions in head and neck cancer, finding the 18F-FDG PET GTV more reflective of pathology than is the MRI or CT GTV. van Baardwijk et al. (18) also observed a good correlation between the maximum diameter on PET scans using an autocontouring method based on source-to-background ratio and using the macroscopic tumor diameter of the surgical specimens from 23 cases of operable lung cancer.
None of the methods of automatic delineation of 18F-FDG–positive tissue can be regarded as reliable or standard. Because the published delineation method greatly influences the size and shape of the GTV, in multicenter studies detailed protocols that include phantom evaluation and quality control must be followed (16). Until now, no objective data have been available on this point. The best method to date, as seen by many, is for the radiooncologist to use contouring. Contouring requires an experienced nuclear medicine specialist following a predefined protocol (16). A significant limitation of our study is the potential observer bias introduced by the use of only one observer to contour volumes; the other observer was used just for verification. Another limitation is that only the maximum diameter of the tumor on pathologic examination, measured macroscopically in 3 dimensions, was compared with the maximum diameter of the GTV in 3 dimensions. In fact, there are still some other problems, such as tumor size reduction due to formalin fixation before pathologic examination and the interobserver variation in PET/CT target volume delineation. However, our data contribute to the existing body of literature that has attempted to define the target of NSCLC on PET images for radiation therapy planning. It is necessary to compare the volumes on pathologic examination with the GTVs.
CT is the current standard for NSCLC GTV delineation, but despite excellent spatial resolution, substantial variation exists (32–34). According to the available literature, CT scan–based target volume delineation may over- or underestimate the extent of the GTV. Chan et al. (35) found that tumors were larger as measured on CT than as measured on the pathology slice. Therefore, with 18F-FDG PET, the target volume can either be enlarged, incorporating additional tumor tissue not identified by CT, or reduced, excluding nontumor structures and leading to a reduction of the irradiated normal tissue and, thus, permitting an escalation of the irradiation dose (36). There is no reliable correlation of CT window and level settings to PET SUV, although Hong et al. (37) developed a method of correlating SUV to window and level thresholds. The use of variable CT-threshold settings (18,36,38) can undoubtedly affect the GTV. Caldwell et al. (39) found a reduction in the ratio of largest to smallest GTV by comparing PET/CT coregistered data with CT alone in a group of 30 patients. Although CT and PET may define a quantitatively similar volume, substantial variations may occur in the qualitative definition of the target, with PET offering valuable metabolic information that could result in enlargements or reductions in the target size. In our previous data, the significance of the change in volume from the CT-based target varied greatly (14). CT window and level settings of 1,600 and −300 HU, respectively, for the primary tumor provided the best correlation with the maximum pathologic diameter (14). Therefore, in this study, we chose these settings for CT contouring.
CONCLUSION
Our study compared automatic contouring and manual contouring in delineation of GTV for NSCLC. Contouring using a manual method correlated better with maximum diameter of the primary tumor on pathologic examination and provided a better matching index in the composite PET/CT GTV. In the PET/CT-delineated target volumes for the primary lung tumor, CT had a good matching index and should still be the basis of the defined GTV. When using a 50% intensity level to contour the target of NSCLC, one should consider manual contouring of 18F-FDG PET to check for any missed disease.
Acknowledgments
This study was supported by the Ontario Cancer Research Network (OCRN). Results from this study have been presented at the ASTRO annual meeting in Boston, September 21–24, 2008, and at the CARO annual scientific meeting in Montreal, Canada, September 10–13, 2008.
- © 2010 by Society of Nuclear Medicine
REFERENCES
- Received for publication April 7, 2010.
- Accepted for publication July 20, 2010.