Abstract
The National Electrical Manufacturers Association (NEMA) standard NU 4-2008 for performance measurements of small-animal tomographs was recently published. Before this standard, there were no standard testing procedures for preclinical PET systems, and manufacturers could not provide clear specifications similar to those available for clinical systems under NEMA NU 2-1994 and 2-2001. Consequently, performance evaluation papers used methods that were modified ad hoc from the clinical PET NEMA standard, thus making comparisons between systems difficult. Methods: We acquired NEMA NU 4-2008 performance data for a collection of commercial animal PET systems manufactured since 2000: microPET P4, microPET R4, microPET Focus 120, microPET Focus 220, Inveon, ClearPET, Mosaic HP, Argus (formerly eXplore Vista), VrPET, LabPET 8, and LabPET 12. The data included spatial resolution, counting-rate performance, scatter fraction, sensitivity, and image quality and were acquired using settings for routine PET. Results: The data showed a steady improvement in system performance for newer systems as compared with first-generation systems, with notable improvements in spatial resolution and sensitivity. Conclusion: Variation in system design makes direct comparisons between systems from different vendors difficult. When considering the results from NEMA testing, one must also consider the suitability of the PET system for the specific imaging task at hand.
The use of PET to study animal models of human disease has been expanding. By the late 1990s, several groups had constructed prototype PET systems (1–9) because it had been found that significant benefits in spatial resolution, sensitivity, image quality, and quantification were achievable using systems designed specifically for small laboratory animals. In 2000, commercial preclinical PET systems became available, and over the next 10 y the performance and capabilities of these systems evolved rapidly.
The maturing market for preclinical PET systems led to the need for standardized methods of performance evaluation. Such standardization facilitates acceptance testing and routine monitoring and allows comparison between systems from different vendors and of different designs. To address this need, the National Electrical Manufacturers Association (NEMA) NU 4 standard was published in 2008 (10). Before then, there was no agreed-upon method to evaluate the performance of preclinical PET systems, and manufacturers could not provide specifications as is done for clinical systems under the NEMA NU 2-1994 (11) and NU 2-2001 (12) standards. In addition, performance evaluation articles on many early-generation preclinical PET systems used methods that were modified ad hoc from the clinical NEMA standard. As no two early systems were evaluated in a consistent manner, it was difficult to compare performance between early and newer camera designs.
In this work, we present NEMA NU 4-2008 performance measurements for a collection of preclinical PET systems that span the first 10 y of commercial availability. Our intent is to provide an objective source to which future systems can be compared, understand how the different design decisions of preclinical PET systems affect performance data, and examine whether the NU 4-2008 tests are adequate to characterize performance. We avoid making qualitative statements about whether one system is better than another (except when comparing systems from a single manufacturer). We also do not consider the performance of add-on features such as CT, animal-handling equipment, or data analysis software, all of which may factor into the choice of the optimal system for a given research program.
MATERIALS AND METHODS
Systems
To be included in this work, a PET system needed to have been commercially manufactured since 2000 and be in good working order. Prototype research systems were specifically excluded. The 11 systems included are summarized in Table 1.
Specifications of Systems Included in Test
Testing
All testing followed the NEMA NU 4-2008 standard (10) as closely as possible. We refer the reader to the NU 4-2008 standard for details. Data were collected and analyzed at each contributing site. For each system, the settings used, such as energy and timing windows and coincidence acceptance angle, were those typically applied in routine imaging. We mention these settings, as appropriate, when we list results. Reasonable effort was made to ensure the completeness of the data; however, in some cases complete results could not be obtained. All testing was performed independent of the system manufacturer and represents the performance of a single system of each model.
Spatial Resolution
Spatial resolution was measured using a 22Na point source embedded in a 1-cm3 acrylic cube. For each dataset, the full width at half maximum (FWHM) and full width at tenth maximum (FWTM) are reported for the axial center and ¼-axial-offset positions.
Sensitivity
The NU 4-2008 sensitivity measurement uses the same 22Na point source as used for spatial resolution measurement. We report values only for absolute sensitivity, a unitless percentage corrected for the 0.9060 branching fraction of 22Na. We report the absolute system sensitivity for the mouse length (sMA,tot), calculated as the average absolute sensitivity over the central 7 cm of the axial field of view (FOV), and total absolute system sensitivity (sA,tot), calculated as the average absolute sensitivity over the entire axial FOV. We do not report the absolute system sensitivity for the rat length (sRA,tot), since it is equivalent to total absolute sensitivity for all systems because the axial FOV is less than 15 cm.
For 2 systems (Inveon and Argus), a centered line source filled with 18F and surrounded by an aluminum cylindric shell was used instead of the 22Na source. In these cases, the sensitivity measured with the 18F line source was calibrated by benchmarking the peak sensitivity in the central slice of the tomograph to a measurement with the 22Na at the center of the FOV. The line source measurement gives values for absolute sensitivity that differ from the point source measurement by less than 1% using the microPET P4 (13).
Scatter Fraction, Count Losses, and Random Coincidence Measurements
The NU 4-2008 methodology for the counting-rate test closely follows the NU 2-2001 methodology for clinical PET, in which a line source filled with 18F is inserted along the length of a high-density polyethylene cylinder. In the NU 4-2008 standard, 3 phantom sizes are used: a mouse phantom (70 mm long, 25 mm diameter [Ø]), a rat phantom (150 mm long, 50 mm Ø), and a monkey phantom (400 mm long, 100 mm Ø).
For each system and phantom tested, we report the peak noise-effective counting rate (NECR), the activity at which peak NECR occurs, and the low-counting-rate scatter fraction. In addition, we report NECR at 3.7 MBq for the mouse phantom and 10 MBq for the rat phantom, as these activity levels correspond to values that are often encountered in routine imaging.
Image Quality and Accuracy of Attenuation and Scatter Corrections
The NEMA NU 4-2008 method uses a fillable phantom (66 mm long, 33.5 mm Ø) for the image quality test. For the uniform cylinder region, we report the maximum and minimum values as ratios, with the mean value and the SD of the pixel values as a percentage of the mean. For the cold cylinder regions, we report the spillover ratio. For the hot rod region, we report the recovery coefficients. On each system, the phantom was imaged for 20 min with an activity level of 3.7 MBq of 18F. All available corrections were applied to the data. All systems had corrections for normalization, dead time, and randoms, but not all systems had corrections available for scatter and attenuation at the time of testing.
RESULTS
Spatial Resolution
Table 2 lists the FWHM and FWTM spatial resolution for each system. Spatial resolution was generally better at the ¼-axial-offset position than at the center, particularly for axial resolution, because of the more oblique lines of response used by the central position than by the ¼-axial-offset position. This effect was particularly noticeable for systems with long axial FOVs.
Spatial Resolution Results
The NU 4-2008 standard requires a filtered backprojection (FBP) algorithm to reconstruct the point source data. This requirement is problematic for system designs that have irregular crystal spacing in the azimuthal and axial directions, such as the LabPET systems. Systems of this design do not typically use FBP algorithms because of the degradation of resolution and artifacts introduced by the interpolation and rebinning of measured data onto projections with regular spacing. Such artifacts, which make the resolution unstable across the FOV, are apparent in the FWHM resolution data measured for the LabPET 8, in which the FWHM radial resolution increases from approximately 1.6 mm at a 5-mm radial offset to 1.9 mm at a 10-mm radial offset. Consequently, the results for the LabPET axial resolution are the intrinsic resolution obtained by finely stepping the point source through the axial direction. Because these results will be the same for the LabPET 8 and the LabPET 12, only LabPET 8 results are presented in Tables 2 and 3. Vendors of most preclinical PET systems offer iterative 3-dimensional reconstruction algorithms, many of which include spatially variant models of system response. As a result, it may be unusual to use FBP reconstruction in routine preclinical PET. For these systems, the FBP algorithm requirement in the NU 4-2008 standard leads to situations never realized in routine imaging situations. This limitation of the NU 4-2008 spatial resolution test does not presently have a practical solution.
Table 3 compares the system crystal size with the effective transaxial FWHM resolution at the 5-mm-offset radial position. Effective transaxial FWHM resolution is calculated as a geometric mean according to
Comparison of Effective Transaxial FWHM Resolution with Crystal Size
Plot of FWHM and FWTM spatial resolution for LabPET 8 system for 2 energy windows.
Figure 2 shows the values of effective transaxial FWHM resolution for each system plotted against radial offset position. The plots can clearly be grouped into 2 families of systems: those manufactured before 2003, that is, first-generation commercial systems, and those manufactured after 2003, that is, second-generation systems.
Comparison of spatial resolution of first-generation animal PET systems with later-generation systems.
Sensitivity
Table 4 shows the values of mouse sensitivity and total sensitivity for each system. As expected, the largest factor affecting detection efficiency is the solid-angle coverage of the detector ring, with higher values for long-axial-FOV and small-ring-diameter systems.
Sensitivity over Central 7 cm of Axial FOV (Mouse Sensitivity), Complete Axial FOV (Total Sensitivity), and Peak Detection Efficiency for Each System Tested
Scatter Fraction, Count Losses, and Random Coincidence Measurements
Table 5 summarizes the results of the counting-rate test for mouse- and rat-sized phantoms. In 2 cases, the peak NECR values were not reached because of limited starting activity. This problem is due to the low volume of the line source and is more likely to occur for the mouse phantom. As expected, the scatter fraction was generally lower for larger-ring systems and for narrower energy windows. The exception was the VrPET system, which used an energy window of 100–700 keV yet had a scatter fraction lower than that of any system using a 250- to 750-keV window. The most likely reason is that the VrPET is a partial-ring system and thus has less scatter from gantry materials. This explanation is consistent with the work of Yang and Cherry (14), who showed that for mouse-sized phantoms imaged in the microPET II, the dominant source of scatter was the gantry. The scatter fraction was lowest for systems that use conventional single-layer block detector designs, such as the Siemens family of systems, or the pixelated Anger logic approach of the Mosaic HP. The highest observed scatter fractions were for the 2 dual-layer systems, with the ClearPET having a scatter fraction of 31% and the Argus having a scatter fraction of 21% for the mouse phantom. It is not clear whether the increased scatter fraction is due to event mispositioning in the block, high levels of gantry scatter events, or the effect of using scintillators with lower photofractions (germanium oxyorthosilicate for the Argus, lutetium yttrium aluminum perovskite for the ClearPET). The ClearPET and the Argus were the 2 systems with the smallest ring diameters, which likely has an effect on the amount of gantry scatter. The LabPET systems, with their individual crystal readout design, have scatter fractions between these 2 extremes. It is believed that the higher scatter fraction measured for the LabPET systems is due to increased gantry scatter from the Kovar (Carpenter Technology Corp.) packages surrounding the detector modules.
Summary of Counting-Rate Test Results for Mouse and Rat Phantoms
The peak NECR value and activity level at which it occurs represent a complex interplay between system design factors. Compared with clinical PET systems, preclinical systems from different vendors have a much wider variation in design, making it difficult to directly use NECR for comparing systems. It is, however, instructive to compare a few systems directly to understand the effects of system differences on the counting-rate results. When the microPET R4 is compared with the microPET Focus 120, the improved sensitivity of the microPET Focus 120 results in significantly higher NECR values at lower activity levels than for the microPET R4. The effects of an extended axial FOV can be seen by comparing the LabPET 8 with the LabPET 12. For the mouse phantom, the peak NECR increased from 279 kcps for the LabPET 8 to 362 kcps for the LabPET 12, with only a 1 MBq change in the activity at which peak NECR occurs. The effects of system ring diameter can be seen by comparing the microPET P4 and microPET R4 results for the rat phantom. The peak NECR for the 2 systems was similar; however, the activity at which peak NECR occurs is larger by nearly a factor of 2 for the microPET P4. The Inveon system had the highest values of peak NECR for the mouse and rat phantoms. A key reason for these high NECR values is the minimal block dead time due to the Quicksilver processing electronics (15), which allow minimal pulse shaping before digitization with 100-MHz analog-to-digital converters and a timing window of 3.4 ns.
Figures 3 and 4 show the NECR counting-rate curves for the systems for the mouse and rat phantoms, respectively. The general shape of the curves is similar for all systems, with an extended linear range of the NECR-versus-activity level below the peak NECR value. For all systems, this linear range extends at least up to 10 MBq, which is sufficient for performing most imaging studies in rodents.
Plot of NECR vs. activity for mouse-sized phantom.
Plot of NECR vs. activity for rat-sized phantom.
Table 6 summarizes the results of the counting-rate test for the monkey-sized phantom. Data on this phantom were acquired using only the 2 systems with the largest ring diameters, the microPET P4 and microPET Focus 220. For this test, the microPET P4 used an energy window of 350–650 keV and the microPET Focus 220 used an energy window of 250–700 keV. This wider energy window results in a significant increase in the scatter fraction from 35.5% for the microPET P4 to 46.6% for the microPET Focus 220.
Summary of Counting-Rate Test Results for Monkey-Sized Phantom
Image Quality and Accuracy of Attenuation and Scatter Corrections
Table 7 summarizes the results from the image-quality phantom for each system tested. The results of the image-quality test are highly dependent on the reconstruction algorithm and the corrections applied. This point is illustrated in Figure 5 for recovery coefficients measured for the microPET P4 system for data reconstructed using 5 different methods: Fourier rebinning followed by 2-dimensional FBP, Fourier rebinning followed by 2-dimensional ordered-subsets expectation maximization, maximum a posteriori (MAP) with β = 0.1, maximum a posteriori with β = 0.447, and 3-dimensional reprojection. Recovery coefficients greater than 1 are measured with MAP reconstructions, likely caused by a combination of using an iterative algorithm to reconstruct pointlike objects in a region that does not have background activity and using a recovery coefficient based on a single-pixel measurement from the average image created by summing a 10-mm axial region.
Summary of Results from Image-Quality Phantom
Plots of recovery coefficient for the microPET P4 system for 5 reconstruction algorithms. MAP = maximum a posteriori; OSEM2D = 2-dimensional ordered-subsets expectation maximization; 3DRP = 3-dimensional reprojection; 2DFBP = 2-dimensional filtered backprojection.
The variability in the results from the microPET P4 system makes it difficult to compare the results from the image-quality phantom across systems from different manufacturers. In general, systems with lower scatter fractions in the mouse phantom counting-rate test had lower spillover ratios regardless of whether corrections were applied. As discussed by Yang and Cherry (14), this observation may reflect the fact that more scatter in preclinical PET originates from sources other than the object being imaged and that scatter correction methods assume scatter originates in the object. For systems that did not use either scatter or attenuation correction in the image-quality test, the spillover ratio for the water compartment correlated with the scatter fraction measured in the counting-rate test using the mouse-sized phantom, which is similar in size to the image-quality phantom (Fig. 6).
Plot of spillover ratio in water compartment vs. scatter fraction measured in counting-rate test using mouse-sized phantom. Line has slope of 1.
Figure 7 shows transverse images through the cold compartment region and coronal images through the 5-mm hot rod for a selection of systems tested. The higher spillover ratios in the uncorrected images from the ClearPET and LabPET 12 can be seen as an increase in apparent activity in the cold compartments.
Image-quality phantom images. Intensity scale of each image is set so that minimum value is 0 and maximum value is 1.25 times mean value of uniform cylinder region.
DISCUSSION
The data show a steady improvement in system performance for newer systems as compared with first-generation systems. This trend is most clearly seen in the improvement in spatial resolution for systems produced after 2003, with all newer systems having an average in-plane FWHM resolution of better than 2 mm over the central 30-mm diameter of the FOV. Similarly, there has been a steady improvement in system sensitivity, driven largely by the extended axial coverage of newer systems.
Several observations were made about the NEMA NU 4-2008 standard. The spatial resolution test requires FBP reconstruction. This test therefore is favorable for systems using a ring geometry and limited axial extent and unfavorable for systems with unconventional geometries. It is therefore possible that a system with poor measured spatial resolution may produce images of exceptional resolution and quality when reconstructed with an iterative algorithm. The sensitivity measurement, performed by stepping a 22Na point source through the axial FOV of the system, requires a large number of repetitive measurements and is time-consuming. We suggest that a line source measurement with the peak sensitivity at the center of the FOV benchmarked to a measurement made with the 22Na source at the center of the FOV can be a rapid and accurate alternative. The counting-rate tests are useful for specifying the range of activities that are suitable for use in the system for various animal sizes. However, in the NU 4 test, there is no requirement to evaluate the ability of the PET system to form an image at activity levels other than the 3.7 MBq used in the image quality phantom. This is a distinct difference from the NU 2 standard for clinical PET in which the counting-rate data are reconstructed to determine up to what activity level the system dead-time correction is functioning accurately and the image is free from pileup artifacts. We suggest that the counting-rate test be modified to include a requirement for reconstructing the counting-rate data and for analyzing the resultant images to determine the quantitative accuracy of the system as a function of counting rate.
When comparing the NEMA test results from preclinical PET systems from different manufacturers, it is important to remember that beyond enabling comparison of PET systems, the additional purpose of the NU 4-2008 standard is to provide a standard set of tests and methods for manufacturers to specify the performance of their imaging systems and for customers to perform acceptance testing and long-term monitoring. Therefore, the number of tests suggested is purposefully limited so that the data can be acquired in a timely manner. However, additional metrics of performance can be evaluated to provide valuable information. For example, we can mention measurement of spatial resolution at low and high counting rates.
CONCLUSION
In this work we have collected and presented NEMA NU 4-2008 performance data for 11 preclinical PET systems commercially manufactured since 2000. Their performance over the range of tests reflects the unique design attributes of each system, the settings at which it is operated, and the manner in which the data are handled and reconstructed. In general, there is a much wider variation in the design of preclinical PET systems than in clinical PET systems because of the different approaches implemented to push the limits of resolution and sensitivity. This variation in system design makes direct comparisons between systems from different vendors difficult since one must also consider the suitability of the PET system for the specific imaging task at hand when considering the results from NEMA testing. The data show a steady improvement in system performance for newer systems as compared with first-generation systems, with notable improvements in spatial resolution and sensitivity.
DISCLOSURE STATEMENT
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
This work was funded by the Natural Sciences and Engineering Research Council of Canada under Discovery Grant 341628-2007. No other potential conflict of interest relevant to this article was reported.
Footnotes
Published online Jun. 14, 2012.
- © 2012 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
REFERENCES
- Received for publication October 12, 2011.
- Accepted for publication March 12, 2012.