The ancients relied on just their visual interpretations of bright objects in the heavens. But advancing technology led to quantifications that properly normalized stellar distances to obtain absolute magnitudes. Somewhat analogously, in PET the standardized uptake value (SUV) came to be used as a tool to supplement visual interpretation. Uptake normalization as a fraction of the injected dose/unit weight has been in use as far back as 1941 (1). It was designated as the differential absorption ratio (DAR) and in the 1980s was being used in PET (2). Aliases such as the differential (or dose) uptake ratio (DUR) and standardized uptake ratio (SUR) occasionally appear in the literature.
The SUV is a special member of a class of dimensionless Q (= average activity per unit volume) ratios in use: tissue Q ÷ a normalizing Q. The latter can be contralateral, a background, an organ (e.g., liver, brain, and so forth), and, in particular, the whole body as SUV = tissue Q ÷ whole-body Q including tracer excretions = tissue Q ÷ injected dose per unit body volume, weight, or area. For the (time invariant) denominator—rather than a region of interest (ROI) around the whole body or using units of volume—there are traditional and convenient uses of weight or body surface area, allowing one to obtain a dimensioned (in mg/mL or m2/mL) result. The SUV (in mg/mL units), when averaged over the entire body, would equal the body density.
Commonly referred to as semiquantitative analysis, SUVs owe their popularity to a simplicity of method compared with others. In particular, it is of interest to compare the SUV with the more fully quantitative influx constant Ki = k1k3(k2 + k3), which requires more effort. Fortunately for diagnostic purposes, the end-of-scan SUV can be diagnostically essentially as discriminating as Ki. This is because when Ki exists (i.e., k3 ≠ 0), the 2 are quite proportional, with a population-average proportionality constant depending only on the time of SUV evaluation and the type of tracer (3,4)—and, in particular, being virtually independent of the tissue type.
A reliable reproducible measure of uptake is sought. An underlying reason for this stems from wanting to make comparisons. Intrapatient comparisons occur during therapy monitoring. Interpatient relative uptakes in the diagnostic process can be related to degrees of pathology. Moreover, as knowledge bases form for a particular disease, contributions arise from several institutions. To facilitate comparisons in all of these settings, one hopes to address any unwanted dispersion in a marker such as the SUV. These would be about its true physiologic value for a specific disease condition when there are differing methodology preferences— the main issue in this perspective.
In seeking reproducible uptake measures, the diagnostician has a large armamentarium from which to choose according to Hoekstra et al. (5). The authors discuss 9 classes of analytic methods suitable for PET image analysis. These, including the SUV class, have subclasses. For the SUV, these subclasses have historically arisen out of a motivation to seek refinements in determinations that can reduce variability from method variations.
METHOD-INFLUENCING FACTORS
The SUV is subject to errors that can arise in various quantitative methods (6,7). A criticism of being casual about SUV methodology has been made (8), citing several factors to consider. However, as part of addressing PET procedures in a broader sense, several consensus groups (9–13)—in discussing some of these factors—have made protocol suggestions. SUV-related topics, along with other factors, are organized in Table 1, which distinguishes confounding factors (i.e., potentially addressable by adjusting SUV results) and defined factors (i.e., biologic states conventionally considered as distinct entities). A possible application of Table 1 would be an aid while writing a Methods section when SUVs are being reported—that is, indicate fully, not just partially, how SUV is being measured.
An aspect of facilitating comparisons is using variability-reducing corrections for the factors in Table 1. Statistical error in an average or reference result can be reduced to the extent that significant changes in confounding variables (e.g., in tumor size, SUV’s time after injection, and so forth) are either used in corrections or kept minimal. It is desirous to have some reference conditions for the purpose of increasing the number of patients that can be grouped together in a study to gain statistical power. Also, diagnostic advantages accrue when variabilities from extraneous factors are reduced. That some SUV methods excel over others is seen in an example in which receiver-operating-characteristic (ROC) areas in 18F-FDG PET breast cancer diagnosis can vary from 0.81 to 0.91 among the methods (20).
In this issue of The Journal of Nuclear Medicine, on pages 1519–1527, Boellaard et al. (21) use simulation to systematically investigate several of the influences in Table 1. Rather than attempting to isolate each influencing effect while all others remain constant, their approach was a practical one: varying several readily controlled and clinically meaningful parameters (sometimes with individual effects therefore combining) in their simulation model. Their extensive data show how these affect R = the observed-to-actually-present counts ratio for an ROI = the observed ÷ the actual numerator of an SUV calculation. The scope of this research is not intended to address the last 3 factors in Table 1—that is, biologic factors and those influencing the denominator of an SUV calculation. The parameters varied are the type of ROI, the presence or absence of a spatial filter for one particular reconstruction algorithm, the noise equivalent counts collected, the number of pixels in the reconstruction matrix, uniform spheric lesion size, and the lesion-to-background ratio. The results show R ranging quite remarkably from as low as 0.4 to as high as 2.9 due to the synergism of having several parameters at once at their extremes. A much smaller, but more meaningful, measure of variability would be something like the SD of R expected due to random occurrences of various parameters having values typically encountered clinically.
Their simulation approach was to extract a part of the reconstruction code for their scanner and add features required for the investigation. Its results validate quite well against scan data from varieties of chest phantom conditions. Hence, the simulation could be used with confidence as influencing parameters are varied. It would appear that a subsequent use for this particular algorithm could possibly be using calculated R values to correct future SUVs for their off-normal parameters compared with some reference set of parameters. Strict validity of such corrections, however, would apply only to the chest region of this particular scanner and reconstruction for which the study was intended. With this approach, notwithstanding parameter differences, possible interinstitutional comparisons of corrected SUVs might be envisioned in which the PET hardware and software are a commonality. Otherwise, as Boellaard et al. (21) caution, SUV comparisons among institutions cannot be made casually—in contrast to acceptable comparisons they show possible for same-patient intrainstitutional studies.
This work is significant in several respects. It quantifies, for their particular chest phantom, the SUV variability encountered due to a variety of factors in combination. In particular, it explores a little-publicized upward biasing effect of higher pixel noises in ROIs based on the maximum single pixel value. The variabilities associated with all parameters studied are useful to observe, both by those measuring SUVs and by others using these. The work also demonstrates how simulation can be useful in addressing these influences. Possibly, some day, interinstitutional SUV comparisons may be more confidently made with research of this type used in combination with some standardization of methods.
SUV USAGE
Visual interpretation, as the bulwark of radiology, is typically combined with other information in diagnoses. Of the latter, the SUV is closely allied to the image reading process. Among the reader’s mental processes are qualitative comparisons of activities: within the image or with prior experience. Hence, unless comparisons are limited to images acquired by a frozen set of methods within an institution, the reader must be aware of the effects listed in Table 1. Having corrected SUVs available, along with knowledge of other factors influencing them, can be an aid.
If, after competing with or supplementing other analytic methods, the SUV has been chosen, Table 1 implies various choices to be made involving measurement parameters. The specific use of an SUV determination can have a bearing on the methods used. For example, correcting the 18F-FDG SUV for serum glucose (i.e., traditionally SUV × [glucose concentration ÷ a standard 100 mg/dL], though data for applicability to each unique tissue type should support this) can be appropriate when monitoring the same patient during therapy. On the other hand, many reports in the literature show no statistically proven advantage in applying this in studies composed of varieties of tissues. As another example, determining whether an intrinsically appropriate partial-volume correction is in fact beneficial can depend on prior experience—that is, whether an expected diagnostic advantage has been shown to be statistically significant in similar circumstances.
A popular usage of SUVs is their capability in helping to distinguish between benign and malignant lesions. For example, a study might find an SUV of 2.5 as appropriate for separating certain benign and malignant lesions. Caution, however, must be exercised using such a cutoff outside of the institution and the application for which it was determined. This is because there are institutional differences in the degree of diagnostic conservatism (i.e., choice of operating point on the ROC), the patient population, the specific pathology studied, and the specific methodology involved in determining the SUV. Interinstitutional variability stemming from the latter 3 categories is evident from Table 2, extracted from a meta-analysis of 18F-FDG PET studies, each having ≥20 patients with SUVs (22). But, if population character and pathology factors existing within the disease categories in Table 2 could be eliminated, a much better picture might emerge. This is suggested somewhat from a subset of 20- to 40-y olds within a meta-analysis (23) of institutions studying the coefficients of variation (CVs) of normal whole brains’ metabolic rates: The average of the CVs of individual studies within 26 institutions = 0.15; the CV contribution due solely to interinstitutional differences = 0.14; and the total (i.e., combined patient and institutional variabilities) CV of metabolic rates among 26 institutions = 0.20. These metabolic rate CVs contrast with the much larger SUV interinstitutional variabilities apparent within categories of Table 2.
Finally, it might be tempting to suggest that, to avoid uncertainties or inaccuracies in an SUV approach, one should turn to fully quantitative methods such as Ki determinations. These determinations offer the benefit of avoiding issues of evaluation time and body size normalization, though typically they exhibit slightly higher intrainstitutional interpatient variability (24) than that found in SUVs. But the other factors in Table 1 still remain to be addressed. With the proportionality constant between SUV and Ki being only physiologically (and not methodology) based (3,4), there is a suggestion that if Table 2 were for Ki determinations, the conclusion could be the same: substantial variability among institutions within each particular category of studies.
CONCLUSION
Messages to carry away from the work of Boellaard et al. (21), supplemented by Table 1 presented here, are the desirability of standardization of protocols and analyses and that both measurer and user of SUVs have an awareness of all influencing factors. Many, though not all, of the latter also apply to influx and rate constant determinations and, to some extent, qualitative visual interpretations. Fortunately, within an institution there can be its preferred de facto standardized approach to SUVs. However, there might be changes over time or lack of acceptance by all, and special caution must be obviously be exercised in interinstitutional SUV usage, which presently is difficult. Helpful for the time being would be a better-documented specification of methods in publications than may be customary, considering Table 1 for guidance. To make the most of this diagnostic tool, as well as benefit from fully quantitative analytic methods, challenges for the future might include:
Further efforts by organizations to reach consensus on standardized approaches in scanner data acquisition and analysis, building on past accomplishments (9–13).
Highly automated user-friendly software that corrects and reports, perhaps by a simulation algorithm, SUVs along with values of their influencing parameters.
Along with patient data, possible reporting of scanned standard phantoms—recommended by a PET Data Analysis Working Group (9) and also suggested by Boellard et al. (21)—and possibly including SUVs, as known activity ratios of local to whole phantom, for the actual geometries and activities being encountered.
More research of the type reported in this issue.
Footnotes
Received Mar. 3, 2004; revision accepted Jul. 16, 2004.
For correspondence or reprints contact: Joseph A. Thie, PhD, University of Tennessee, 12334 Bluff Shore Dr., Knoxville, TN 37922.
E-mail: jathie{at}utk.edu