Abstract
Phantom studies have shown improved lesion detection performance with time-of-flight (TOF) PET. In this study, we evaluate the benefit of fully 3-dimensional, TOF PET in clinical whole-body oncology using human observers to localize and detect lesions in realistic patient anatomic backgrounds. Our hypothesis is that with TOF imaging we achieve improved lesion detection and localization for clinically challenging tasks, with a bigger impact in large patients. Methods: One hundred patient studies with normal 18F-FDG uptake were chosen. Spheres (diameter, 10 mm) were imaged in air at variable locations in the scanner field of view corresponding to lung and liver locations within each patient. Sphere data were corrected for attenuation and merged with patient data to produce fused list-mode data files with lesions added to normal-uptake scans. All list files were reconstructed with full corrections and with or without the TOF kernel using a list-mode iterative algorithm. The images were presented to readers to localize and report the presence or absence of a lesion and their confidence level. The interpretation results were then analyzed to calculate the probability of correct localization and detection, and the area under the localized receiver operating characteristic (LROC) curve. The results were analyzed as a function of scan time per bed position, patient body mass index (BMI < 26 and BMI ≥ 26), and type of imaging (TOF and non-TOF). Results: Our results showed that longer scan times led to an improved area under the LROC curve for all patient sizes. With TOF imaging, there was a bigger increase in the area under the LROC curve for larger patients (BMI ≥ 26). Finally, we saw smaller differences in the area under the LROC curve for large and small patients when longer scan times were combined with TOF imaging. Conclusion: A combination of longer scan time (3 min in this study) and TOF imaging provides the best performance for imaging large patients or a low-uptake lesion in small or large patients. This imaging protocol also provides similar performance for all patient sizes for lesions in the same organ type with similar relative uptake, indicating an ability to provide a uniform clinical diagnosis in most oncologic lesion detection tasks.
The last few years have seen the introduction of time-of-flight (TOF) PET for clinical whole-body imaging, and the 3 major PET scanner manufacturers now have commercially available fully 3-dimensional (3D) PET/CT scanners with TOF capability (1–3). Although TOF PET was originally developed in the 1980s, its current mode of operation in a fully 3D scanner design with improved spatial resolution and iterative image reconstruction algorithms can lead to improvements in image quality that may not be quantified by the previously defined metrics of TOF gain. Past evaluations had shown that with TOF PET, improved image signal-to-noise ratio that is proportional to the square root of the object size and inversely proportional to the square root of the system timing resolution could be achieved (4,5). In recent years, the primary clinical imaging application using PET has been in oncology, in which lesion detection and quantification are the main tasks performed by the physicians. Consequently, recent evaluations of TOF imaging have focused on this area. Simulations and measurements have shown faster and more uniform convergence of lesion contrast with TOF PET in physical phantoms (1,6–9) and clinical patient studies (9,10). In addition, TOF PET lesion studies using numeric observers have shown improved lesion detectability in uniform objects with improving timing resolution (6,11). A simplification of these early lesion detection studies is the presence of a uniform background, which is not representative of most clinical imaging situations. In these studies, the task was also detection of a lesion at a known position (signal-known-exactly task). With statistical noise present in PET images, detecting a lesion at an unknown position is more challenging and thus may represent a more clinically relevant task. A recent study attempted to overcome some of these limitations by acquiring data on a TOF PET scanner with a physical anthropomorphic phantom (12). That study used numeric and nonclinician human observers to measure the impact on lesion detection and localization, and localized receiver operating characteristic (LROC) curve methodology showed improved performance with TOF imaging (12).
The goal of our study was to evaluate the benefit of TOF PET for lesion detection and localization as a function of multiple parameters (patient size, lesion location, scan time) in a real patient. Using a previously developed methodology, we inserted sphere data into clinical 18F-FDG scans of patients of varying sizes (13,14) to simulate the presence of lesions. Lesions were inserted in 2 organs (liver and lungs) in different locations, and images were reconstructed for 2 different scan times. Previously, we have reported on the accuracy of this technique in generating patient images with simulated lesions and presented results from a numeric observer analysis for lesion detectability (14). The results of that study directed the current human observer study; thus, a smaller, and more relevant, subsection of images was presented to the readers. For this study, we chose to emphasize 2 clinically challenging situations in patients: detection of small liver lesions with low uptake relative to local liver background and detection of small lung lesions with low absolute uptake and low uptake relative to local lung background. The liver lesions represent detection of lesions that are in a generally uniform local background but are subject to nonuniform attenuation in a patient body and nonuniform activity distribution in the surrounding regions. The lung lesions, on the other hand, represent detection of lesions that are in a nonuniform local background, in addition to being subject to nonuniform attenuation in a patient body and nonuniform activity distribution in the surrounding regions. Our hypothesis was that, compared with non-TOF PET, TOF PET would lead to improved lesion detection and localization for these 2 clinically challenging tasks, with a bigger impact in larger patients.
MATERIALS AND METHODS
Scanner and Image Reconstruction
All patient scans and lesion measurements were obtained on the Gemini TF PET/CT scanner (Philips Healthcare), which is a TOF-capable, fully 3D PET scanner, and a 16-slice Brilliance CT scanner (Philips Healthcare) (1). The PET component of this scanner uses 4 × 4 × 22 mm lutetium yttrium oxyorthosilicate crystals. This scanner has a measured spatial resolution of 4.8 mm near the center of the field of view and an intrinsic system timing resolution of 585 ps, although, because of the effects of higher counting rates, the clinical data presented here were acquired with a system timing resolution of 670 ps.
The Gemini TF scanner acquires list-mode data, which are reconstructed with and without TOF information using an ordered-subsets expectation maximization algorithm with 33 chronologically ordered subsets; for TOF reconstructions, a TOF kernel was incorporated into the forward and backward projections (15). The attenuation map was obtained from a unenhanced CT image acquired with normal patient breathing, whereas scatter was estimated using a TOF-extended single-scatter simulation (16,17). Attenuation, detector efficiency and normalization, scatter, and random coincidences are incorporated into the system model during image reconstruction to produce fully corrected images.
Patient Studies
For this investigation, we selected 100 patient studies with a normal 18F-FDG biodistribution and no evidence of abnormal lesions. On the basis of the standard imaging protocol followed at the time of this study at the PET Center at the University of Pennsylvania, each patient was scanned for 3 min/bed position, 60 min after the injection of 18F-FDG (555 MBq [15 mCi]). Because the data were acquired in list mode, we have the ability to retrospectively reconstruct for scan times shorter then 3 min/bed position. A complete patient study typically involves 8–10 overlapping bed positions to image the patient from the base of the brain to mid thighs. In this study, for each patient we selected a single bed position that was determined by experienced clinical readers to have normal 18F-FDG uptake in the thoracolumbar region (including lower lungs and upper liver) for insertion of lesion data. The 2 bed positions adjacent to this dataset were also reconstructed to perform slice overlapping and thus achieve image noise characteristics similar to a clinical image.
Generation of Lesion-Present Images
For every patient dataset, we chose specific regions of the lung and liver at which to add 10-mm diameter spherical lesions. Within each organ region, the exact lesion position was chosen randomly for each patient, so that the lesions did not appear at the same position in the image for all patients. We used 10-mm diameter plastic spheres (wall thickness, 1 mm) filled with 18F-FDG (5–50 MBq/mL) and acquired imaging data while the spheres, in air, were at locations within the scanner field of view that overlapped with the chosen region for each patient. Because no attenuation was present in these acquisitions, short scan times (1–2 min) were sufficient to obtain a reasonable number of sphere counts. The number of sphere counts needed for addition to patient data was estimated, but the number of counts collected in the sphere data acquisition was always a larger fraction of this number. Random and scatter coincidences in the sphere data acquisition were negligible (<3%).
Regions of interest equal to the sphere diameter were drawn in the fully corrected patient image at the specific lesion location to measure the mean activity concentration (CB). A sphere with an activity uptake ratio of u with respect to the local background would emit an additional (u – 1) × CB counts. Because the scanner geometric efficiency would be a function of sphere location, the (u – 1) × CB counts were also appropriately corrected for this effect; the resulting number of counts was extracted from the sphere-in-air list data to be merged with the patient data (extracted sphere list file). Because the sphere data were collected in air, the extracted list file was attenuated using the patient transmission map (from the CT image) to obtain the attenuated sphere list file. This file was then uniformly merged with the patient list file to obtain a fused, lesion-present dataset. The fused dataset was reconstructed using the list-mode reconstruction with the same transmission map and scatter estimate as the one generated for the original patient data. Figure 1 schematically shows the steps involved in the generation of the lesion-present dataset. In our previous work, we have successfully verified through a phantom study this process of inserting lesions with a predetermined activity uptake ratio at a fixed location (14). A consequence of the lesion-insertion process for this work was the generation of 3 single bed position list files (representing a 3-min scan) for image reconstruction per patient: a normal list file representing the original lesion-absent dataset, fused list file representing the addition of a single lesion in the liver, and fused list file representing the addition of a single lesion in the lung.
Flow chart showing steps involved in generation of fused list files with lesion data inserted in patient dataset, followed by image reconstruction.
Human Observer Study
The 3 list files per patient (no lesion, lesion inserted in liver, and lesion inserted in lung) for a single bed position were reconstructed using TOF and non-TOF list-mode reconstruction. Lesion activity uptake ratios (relative to local background) were 3.5:1 and 3.0:1 for liver and lung lesions, respectively. Because the liver has a higher background uptake, the absolute uptake for the liver lesions was higher, whereas the absolute lesion uptake in the lung, for which normal organ uptake is low, was low. All list data were reconstructed for the full 3-min scan and a 1-min scan. Because we use 3 iterations of ordered-subsets expectation maximization reconstruction clinically, for this work we decided to restrict our evaluation of all images (TOF and non-TOF) to 3 iterations as well. There were 3 variable parameters for each patient: lesion location (lung, liver, or no lesion), scan time (1 or 3 min), and type of reconstruction (TOF or non-TOF). In addition, the patients were also separated into 2 equal population body mass index (BMI) categories: BMI less than 26, corresponding to small or average patients, and BMI 26 or more, corresponding to average or large patients. Hence, a total of 1,200 images were created (lesion presence or absence, 3; scan time, 2; reconstruction type, 2; and number of patients, 100) and separated into 24 different sets (lesion presence or absence, 3; scan time, 2; reconstruction type, 2; and BMI type, 2), with 50 images per set.
For our image reading, we had 5 readers representing different levels of specialization or expertise. Readers 1 and 2 were board-certified nuclear medicine physicians; readers 3 and 4 were recent medical graduates doing elective research work in nuclear medicine. Reader 5 was a PET physicist with more than 25 y of experience in PET image analysis and evaluation but no direct experience in clinical image interpretation. Each reader was given 30 images from each set, leading to 720 randomly distributed images to be read by every reader. A special viewing program was developed that displayed triplanar views (axial, coronal, and sagittal) of PET image sets. The viewing program featured scroll bars, performed triangulation between the 3 planes, and had adjustable window levels that could display images in different color maps. The readers were told that each image had either no lesion or only 1 lesion present in the liver or the lung. They were asked to localize the lesion position in 3 dimensions by clicking on the image and report with a confidence level (of which there were 6) the presence or absence of a lesion by clicking a button at the bottom of the image viewer. The reading for each image was then written to a text file for further processing. For data analysis, correct lesion localization was defined to be within 10 mm of the known lesion center. Each reader's dataset was analyzed using Swensson's LROCFIT program to generate the receiver operating characteristic and LROC curves and the value for area under the 2 curves (AROC and ALROC, respectively) (18) for 16 different categories (2 BMI levels × 2 scan times × 2 types of image reconstruction × 2 lesion locations).
RESULTS
In Figure 2A, we show sample reconstructed images from a patient with a BMI of 28.4, with a lesion inserted in the liver region. Figure 2B shows reconstructed images from a different patient with a BMI of 24.6 and a lesion inserted in the lung region. Images are shown for scan durations of 1 and 3 min/bed position with TOF and non-TOF image reconstruction. These images illustrate the type of cases presented to the readers and the increased challenge of lesion detection for shorter scan times and non-TOF image reconstruction.
Reconstructed images for liver lesion present in patient with BMI of 28.4 (A) and lung lesion present in patient with BMI of 24.6 (B). Arrows indicate location of inserted lesion. Color scale has been saturated in B to show lesion more clearly.
In Figures 3 and 4, we summarize our results for the AROC and ALROC values. Here, we show results for readers 1 and 2, who had the most experience in interpretation of clinical PET images. From Figure 3 we see that heavy patients (BMI ≥ 26) generally had lower AROC values than did the lighter patients (BMI < 26). Also, the AROC values for lung lesions were lower than those for liver lesions for all patient sizes. Our results show that, generally, the scans for 3 min/bed position, compared with scans for 1 min/bed position, led to higher AROC values for all patient sizes. TOF imaging, however, led to an improvement mainly in heavy patients for both liver and lung lesions. The ALROC values as shown in Figure 4 follow the same trends as seen with the AROC values, but the results are enhanced because of the inclusion of lesion localization effect. For example, the ALROC value for lesions was lower in large patients overall because of poor lesion localization and significantly compromised in the challenging situation of lung lesion detection. Generally, TOF imaging led to an improvement in the ALROC for liver lesions in heavy patients and lung lesions for both patient sizes (although reader 1 did not see a big improvement with TOF for lung lesions in light patients after a 1-min scan).
Results for AROC values obtained from observations of reader 1 for liver lesions (A), reader 1 for lung lesions (B), reader 2 for liver lesions (C), and reader 2 for lung lesions (D). Results are shown for TOF (T) and non-TOF (NT) images as function of scan time per bed position of 1 (1m) or 3 (3m) min and patient BMI < 26 (L) or BMI ≥ 26 (H).
Results for ALROC values obtained from observations of reader 1 for liver lesions (A), reader 1 for lung lesions (B), reader 2 for liver lesions (C), and reader 2 for lung lesions (D). Results are shown for TOF (T) and non-TOF (NT) images as function of scan time per bed position of 1 (1m) or 3 (3m) min and patient BMI < 26 (L) or BMI ≥ 26 (H).
Although the absolute performance of all readers varied somewhat, the rank ordering of the different ALROC values for each reader was generally the same within the error limits. A Tukey all-pairs comparison test for statistical significance of differences between the ALROC values for each reader was performed with a Bonferroni adjustment for multiple comparisons (19). In Table 1, we summarize the number of readers who had a statistically significant (P < 0.01) difference between their ALROC results for TOF versus non-TOF images. For the readers who showed a statistically significant (P < 0.01) difference, the TOF ALROC value was always higher than the corresponding non-TOF ALROC value. With TOF imaging, all or most readers therefore saw an improvement in the ALROC value for lung lesions for both patient sizes and the 2 scan times. For liver lesions, TOF imaging did not show a significant improvement in the light patients. These results are in agreement with what we observed earlier in the ALROC results shown in Figure 4 for readers 1 and 2. In Table 2, we summarize the number of readers who had a statistically significant (P < 0.01) difference between their ALROC results for the 1-min scan versus the 3-min scan. For the readers who showed a statistically significant (P < 0.01) difference, the 3-min scan ALROC value was always higher than the corresponding 1-min scan ALROC value, except for 1 reader, when reading images for patients with a BMI less than 26. In heavy patients, 3-min scans therefore led to an improvement for all readers for all imaging categories, whereas in light patients the gain with 3-min scans was not always present for the 5 readers.
Summary of Number of Readers Who Had Statistically Significant (P < 0.01) Difference Between Their ALROC Results for TOF Versus Non-TOF Images
Summary of Number of Readers Who Had Statistically Significant (P < 0.01) Difference Between Their ALROC Results for 1-Minute Versus 3-Minute Scans per Bed Position
In Figure 5, we plot the average ALROC values for all 5 readers for varying scan times, type of reconstruction, and patient BMI. Results are shown separately for the liver and lung lesions. As noted earlier in Figure 4 for readers 1 and 2, the average ALROC values for lung lesions were noticeably lower than those for liver lesions. This could be due to the use of a fixed uptake ratio for the inserted lesions relative to the local background, which makes lung lesion detection challenging because the lesions are located in a much noisier lung background (compared with a more uniform, less noisy liver background). In Tables 3 and 4, we show results from a Tukey all-pairs comparison test for statistical significance of differences between the average ALROC values for all readers. Generally, heavy patients had lower ALROC results than light patients, and compared with 1-min scans, 3-min scans led to improved performance (statistically significant differences). For the liver lesions, we found that TOF imaging led to a statistically significantly (P < 0.01) improved ALROC value in heavy patients for both scan times, whereas in light patients it led to an improvement only in the 1-min scans. This difference in ALROC value is important because, overall, the ALROC results for all non-TOF images were statistically significantly (P < 0.01) lower for heavy patients than for light patients, and so TOF imaging led to improved performance for the heavier patients. The challenging situation of lung lesion detection showed that TOF reconstruction always led to a statistically significantly improved ALROC value. Looking at Table 4 we find that for lung lesions the differences in the ALROC results for 3-min scans in both light and heavy patients were not statistically significant, indicating a more uniform performance for different patient sizes. However, compared with the liver lesions, lung lesions had low ALROC values overall and it was only with the use of long scans (3-min scans here) and TOF information that ALROC values in the range measured for the liver lesions could be obtained.
Results for average ALROC values obtained from all 5 reader observations for liver (A) and lung (B) lesions. Results are shown for TOF (T) and non-TOF (NT) images as function of scan time per bed position of 1 (1m) or 3 (3m) min and patient BMI < 26 (L) or BMI ≥ 26 (H).
Results from Tukey All-Pairs Comparison (P Value with Bonferroni Adjustment for Multiple Comparisons) of Average ALROC Results for All Readers for Liver Lesions
Results from Tukey All-Pairs Comparison (P Value with Bonferroni Adjustment for Multiple Comparisons) of Average ALROC Results for All Readers for Lung Lesions
DISCUSSION
Qualitatively, in Figure 2A (light patient) the lesion is clearly visible in the 1- and 3-min TOF and 3-min non-TOF images (as indicated by the arrows). In Figure 2B (heavy patient), both TOF images and the 3-min non-TOF image provide reasonable confidence in correctly localizing the lesion, but the 1-min non-TOF image may be challenging to read. Quantitatively, our results suggest that although AROC indicates improved performance with TOF in heavy patients, the accuracy of lesion localization by human observers can be reduced in real patients because of a heterogeneous background. In this situation, we believe that the ALROC metric may provide a better measure of gain in clinical diagnostic situations.
As summarized in Tables 3 and 4, TOF PET was shown to be consistently better than non-TOF PET in clinical lesion diagnosis (statistically significant difference), except for the easiest task of liver lesion detection in a 3-min scan of light patients. Although TOF PET did not statistically significantly affect light patient studies of longer duration (3 min) in the liver, in the more challenging situation of lung lesion detection as set up in this study, it provided improved detection and localization capability for all patient sizes and scan times. Using the average ALROC results over all 5 readers (Fig. 5), we also found, as expected, that longer scan times (3 min/bed position) provide improved performance for both TOF and non-TOF, with a more pronounced effect in heavy patients. For lesions in a given organ (liver or lung in this study), the difference in ALROC values between the 2 BMI levels is less for the long scan time (3 min/bed position vs. 1 min/bed position) and is reduced further with the addition of TOF. Thus, a long TOF scan leads to more uniform ALROC values across different patient sizes and organs. The ALROC values for 1-min TOF scans were generally close to the ALROC values for 3-min non-TOF scans for a given patient size and organ. However, the difference was statistically significant (P < 0.01), and 1-min TOF scans were always worse than the 3-min non-TOF scans, except for the case of lung lesion detection in patients with a BMI less than 26. However, because lung lesions were difficult to detect (low ALROC values), longer scan times and TOF imaging may be necessary for adequate clinical interpretation. Consequently, with TOF imaging a scan time between 1 and 3 min/bed position could provide an optimal clinical performance, depending on the task at hand.
When this study was originally conceived, the standard imaging protocol at the University of Pennsylvania PET Center required a 3-min scan per bed position. Currently, the protocol requires a scan time that varies between 1 and 3 min per bed position based on the patient BMI. This change in imaging protocol was based on a visual impression of image quality versus scan time and is consistent with the quantitative results derived in this study using human observers and the ALROC metric.
One consideration of our study, as currently conceived, is that only 2 of our readers had substantial experience in interpretation of clinical PET. Although the results from the other 3 readers generally follow the same ALROC trends, having more experienced readers would be beneficial. Future research that involves subtle, difficult-to-detect lesions should involve a larger number of this type of reader if possible.
For iterative image reconstruction algorithms, the image changes as a function of number of iterations and hence affects readings for lesion detectability. In previous work, we evaluated the change in lesion detectability as a function of iteration number using a numeric observer and observed that, generally, TOF images converge more quickly to a maximum lesion detection signal-to-noise ratio (14). However, whereas the convergence can vary as a function of patient and lesion size, as well as lesion uptake and image reconstruction algorithm, in a clinical environment the number of iterations is generally fixed to a value that provides good images over a range of imaging situations. Hence, for this work we decided to restrict our evaluation to 3 iterations of each reconstruction algorithm. In the previous study, we also noticed a small (5%) change in the signal-to-noise ratio for non-TOF images when reconstructing for more than 3 iterations, whereas the TOF images are close to the maximum signal-to-noise ratio after 3 iterations. So although the absolute value of the ALROC metric may increase a little if non-TOF images with more iterations were used in this work, the general conclusions derived from this study should not change. Future work will involve further investigation of the impact of this parameter on lesion detectability.
In addition, there are 2 other physiologic limitations related to respiratory motion and variability in tumor uptake values that our study did not take into account. The thoracoabdominal region undergoes a considerable amount of motion due to respiration during PET, and this motion is a common source of false-negative results for in vivo lesions at the lung base and in the liver dome. The inserted lesions in our study were in a static position during scanning and thus are likely to be more easily detected than moving in vivo lesions. Future research involving addition of moving lesions and motion-correction techniques could lead to further advances in the detection and characterization of lesions in this challenging anatomic region. Also, the inserted lesions in this study had a fixed local activity ratio, whereas in vivo lesions can vary widely in intensity, and poorly 18F-FDG–avid tumors (e.g., bronchoalveolar carcinoma, hepatocellular carcinoma) can be difficult to detect, particularly when their location is affected by respiratory motion. Future research using added lesions of varying activity could potentially show whether TOF PET is advantageous over non-TOF PET in the setting of poorly 18F-FDG–avid lesions whose activities are even lower those analyzed in this study.
CONCLUSION
In smaller patients, although short scan times (1 min in this study) may sometimes be adequate for certain clinical diagnoses, longer scan times (3 min in this study) still provide better performance for challenging clinical situations. However, when imaging large patients or a low-uptake lesion in small or large patients, a combination of longer scan time and TOF imaging provides the best performance. Finally, longer TOF scans in all patients provide similar performance for all patient sizes for lesions in the same organ type with similar relative uptake, indicating an ability to provide a more uniform clinical diagnostic capability in most oncologic lesion detection tasks.
DISCLOSURE STATEMENT
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
We thank Matthew Werner (Radiology, University of Pennsylvania, Philadelphia) for help with generating the images. This work was supported by the National Institutes of Health grants R01-CA113941 and R01-EB009056.
- © 2011 by Society of Nuclear Medicine
REFERENCES
- Received for publication December 21, 2010.
- Accepted for publication February 18, 2011.