Abstract
PET-based treatment response assessment typically measures the change in maximum standardized uptake value (SUVmax), which is adversely affected by noise. Peak SUV (SUVpeak) has been recommended as a more robust alternative, but its associated region of interest (ROIpeak) is not uniquely defined. We investigated the impact of different ROIpeak definitions on quantification of SUVpeak and tumor response. Methods: Seventeen patients with solid malignancies were treated with a multitargeted receptor tyrosine kinase inhibitor resulting in a variety of responses. Using the cellular proliferation marker 3′-deoxy-3′-18F-fluorothymidine (18F-FLT), whole-body PET/CT scans were acquired at baseline and during treatment. 18F-FLT–avid lesions (∼2/patient) were segmented on PET images, and tumor response was assessed via the relative change in SUVpeak. For each tumor, 24 different SUVpeaks were determined by changing ROIpeak shape (circles vs. spheres), size (7.5–20 mm), and location (centered on SUVmax vs. placed in highest-uptake region), encompassing different definitions from the literature. Within each tumor, variations in the 24 SUVpeaks and tumor responses were measured using coefficient of variation (CV), standardized deviation (SD), and range. For each ROIpeak definition, a population average SUVpeak and tumor response were determined over all tumors. Results: A substantial variation in both SUVpeak and tumor response resulted from changing the ROIpeak definition. The variable ROIpeak definition led to an intratumor SUVpeak variation ranging from 49% above to 46% below the mean (CV, 17%) and an intratumor SUVpeak response variation ranging from 49% above to 35% below the mean (SD, 9%). The variable ROIpeak definition led to a population average SUVpeak variation ranging from 24% above to 28% below the mean (CV, 14%) and a population average SUVpeak response variation ranging from only 3% above to 3% below the mean (SD, 2%). The size of ROIpeak caused more variation in intratumor response than did the location or shape of ROIpeak. Population average tumor response was independent of size, shape, and location of ROIpeak. Conclusion: Quantification of individual tumor response using SUVpeak is highly sensitive to the ROIpeak definition, which can significantly affect the use of SUVpeak for assessment of treatment response. Clinical trials are necessary to compare the efficacy of SUVpeak and SUVmax for quantification of response to therapy.
PET continues to gain importance as a tool to assess response to therapy. Typically, the change in standardized uptake value (SUV) is measured to quantify treatment response (1). Patients are then classified into different response categories based on the relative change in SUV. These categories include complete response, partial response, stable disease, and progressive disease. Such response classifications are often used to guide subsequent treatment decisions and can be predictive of clinical outcome (2–4).
Most response assessment studies measure the change in maximum SUV (SUVmax), a single-pixel value that is adversely affected by noise (5–8), which leads to uncertainty in the quantification of treatment response. Consequently, peak SUV (SUVpeak) has been suggested as a more robust alternative (9), defined as the average SUV within a small, fixed-size region of interest (ROIpeak) centered on a high-uptake part of the tumor (9). SUVpeak is illustrated in Figure 1. Because of its larger volume, SUVpeak is less affected by image noise than SUVmax (6,7,10) and therefore is expected to reduce uncertainties in the quantification of response to therapy.
There is a wide variety of SUVpeak definitions in the literature, and they differ in the shape, size, and location of the ROIpeak (Fig. 1). Shapes and sizes include square and cuboidal regions with side lengths ranging from 7 to 15 mm (5,8,11–13), as well as circular, cylindric, and spheric regions with diameters ranging from 9 to 17 mm (6,7,9,14–16). Locations include the tumor region with the highest radiotracer uptake, the tumor region yielding the greatest SUVpeak, and the tumor region containing the voxel of maximum uptake.
The definition of SUVpeak could significantly affect the quantification of treatment response. Altering the size, shape, or location of ROIpeak may affect the relative change in SUVpeak and ultimately the classification of response. Uncertainties in the quantification of response could have significant implications regarding treatment decisions and clinical prognoses. Furthermore, these uncertainties could influence the recommendation to use SUVpeak rather than SUVmax for response assessment. Consequently, we investigated the impact of different ROIpeak definitions on the quantification of SUVpeak and tumor response to therapy.
MATERIALS AND METHODS
Treatment and Imaging
Seventeen patients with advanced solid malignancies were treated with a multitargeted receptor tyrosine kinase inhibitor with antiproliferative and antiangiogenic effects. Malignancies included a diverse range of tumor types: renal cell carcinoma (n = 7), esophagus (n = 2), hepatocellular (n = 2), prostate (n = 1), sarcoma (n = 1), small cell lung (n = 2), thymus (n = 1), and uterine carcinosarcoma (n = 1). Response to therapy was measured using the PET radiotracer 3′-deoxy-3′-18F-fluorothymidine (18F-FLT). As a surrogate of cellular proliferation, 18F-FLT is emerging as a promising candidate for chemotherapy response assessment as demonstrated in patients with lymphoma, breast cancer, and glioma (17–23). Patients were injected intravenously with 240 MBq (6.5 mCi) of 18F-FLT and underwent whole-body PET/CT at baseline (pretreatment) and during treatment using a Discovery LS PET/CT scanner (GE Healthcare). 18F-FLT was synthesized following the method described by Martin et al. with slight modifications (24). PET/CT began 47 ± 4 min after injection and extended inferiorly from the base of the skull to the distal femora. Acquisition time was 10 min per bed position. PET images were reconstructed on a 128 × 128 grid over a 50-cm field of view using the ordered-subset expectation maximization algorithm with 2 iterations, 28 subsets, a 5-mm gaussian loop (interiteration) filter, a 3-mm gaussian postprocessing filter, and CT attenuation correction. On average, patient weight changed only 1.5% between the 2 PET scans.
The study protocol was approved by the University of Wisconsin Health Sciences Institutional Review Board, the Scientific Review Board of the University of Wisconsin Carbone Comprehensive Cancer Center, and the University of Wisconsin Radiation Drug Research Committee. Written informed consent was obtained from each patient before enrollment in the study.
Quantification of SUVpeak and Tumor Response
PET activity concentrations (MBq/cm3) were converted to standardized uptake values by dividing by the injected activity per patient mass. 18F-FLT–avid lesions (∼2/patient) were segmented on PET images by an experienced nuclear medicine physician. The location and number of lesions were as follows: lung, 14; mediastinum, 5; liver, 4; abdomen, 3; adrenal, 2; gastrointestinal tract, 2; pelvis, 1; gluteus, 1; uterus, 1; arm, 1; bone, 1. Tumor volumes ranged from 1 cm3 to 530 cm3, with an average of 66 cm3.
For each tumor, 24 different SUVpeaks were determined by changing the region of interest (ROIpeak) used to measure SUVpeak (Fig. 2). The shape, size, and location of ROIpeak were varied as follows: for shape, circular (2-dimensional) or spheric (3-dimensional) ROIs were used; for size, ROI diameters of 7.5, 10, 12.5, 15, 17.5, or 20 mm, encompassing the range of fixed ROI lengths in the literature, were used; for location, the ROI was centered on SUVmax or was placed in the highest-uptake region. An example ROI is shown in Supplemental Figure 1, where a 12.5-mm-diameter circular ROI was placed in the highest-uptake region of a lung lesion (supplemental materials are available online only at http://jnm.snmjournals.org).
SUVpeak was determined automatically. First, an ROI (circular or spheric) was centered on each tumor voxel and the average SUV within the ROI was determined by weighting each voxel uptake value by the percentage of its volume contained within the ROI. The ROI location yielding the greatest average SUV was defined as the highest-uptake region of the tumor (Supplemental Fig. 1). In addition, an ROI (circular or spheric) was centered on SUVmax and the average SUV within the ROI was determined.
For each tumor, the 24 SUVpeaks were normalized to the mean intratumor SUVpeak (Eq. 1) and their variation was measured using the coefficient of variation (CV) and range.
For each ROIpeak definition, a population average SUVpeak (Eq. 2) was determined over all tumors.
Tumor response (R) during treatment was defined as the relative change in SUVpeak normalized to the baseline SUVpeak (Eq. 3).
For each tumor, the 24 different SUVpeaks gave rise to 24 different responses whose variation was measured using the standardized deviation (SD) and range. For each ROIpeak definition, a population average response (Eq. 4) was determined over all tumors.
Tumor response was also determined using SUVmax for comparison with response measured using SUVpeak. SUVmax can be considered as a special case of SUVpeak in the limit of a very small ROIpeak (single-voxel ROIpeak). For SUVmax, an equivalent diameter of 5 mm was derived by calculating the diameter of a sphere whose volume equaled the volume of the single voxel (65 mm3) represented by SUVmax.
One-way ANOVA was used to test whether the ROIpeak definition resulted in statistically significant differences in SUVpeak and tumor response. The Levene test for equal variance was used, and means were compared with the Bonferroni test. Differences were considered statistically significant at an α-level of less than 0.05/24. Correlations between the variation in SUVpeak and tumor response and other tumor characteristics were tested using the Pearson correlation coefficient (r) and considered statistically significant at an α-level of less than 0.05.
RESULTS
Individual Tumors
SUVpeak
Within individual tumors, considerable variation in SUVpeak resulted from changing the ROIpeak definition. The variable ROIpeak definition led to an intratumor SUVpeak variation ranging from 49% above to 46% below the mean, resulting in a 17% CV. These intratumor variations in SUVpeak are highlighted for a retroperitoneal lesion in Supplemental Figure 2 and for all lesions in Supplemental Figure 3.
The size of ROIpeak caused more variation in intratumor SUVpeak than did the location or shape of ROIpeak (Supplemental Fig. 3). Within individual tumors, varying ROIpeak diameter resulted on average in a 14% CV associated with SUVpeak, compared with a CV of 9% and 5% when the location or shape, respectively, of ROIpeak was varied. In general, intratumor SUVpeak tended to decrease, but its variation tended to increase as the size of ROIpeak increased (Supplemental Fig. 2).
There was no significant correlation between tumor size and the variation in intratumor SUVpeak (Supplemental Fig. 3, tumors ordered by size). Furthermore, there was no significant correlation between intratumor uptake heterogeneity (measured by CV of tumor uptake) and the variation in intratumor SUVpeak.
Tumor Response
Within individual tumors, a substantial variation in tumor response resulted from changing the ROIpeak definition. Intratumor response ranged from 49% above to 35% below the mean, resulting in a 9% SD. These intratumor variations in response are highlighted for a retroperitoneal lesion in Figure 3 and for all lesions in Figure 4. Responses determined using SUVmax were within the range of responses quantified with SUVpeak in almost 70% of all tumors (Fig. 4).
Variation in intratumor response resulted in the ambiguous classification of individual tumors into multiple response categories (Table 1; Figs. 3 and 4). Different response thresholds were applied to classify tumors into response categories (e.g., +30% and –30% for progressive disease/stable disease and stable disease/partial response thresholds, respectively, as recommended by PET Response Criteria in Solid Tumors [PERCIST]). When response thresholds of ±20%, ±30%, and ±40% were applied, 55%, 42%, and 32%, respectively, of all tumors suffered from an ambiguous response classification. In addition, response classifications using SUVpeak and SUVmax were compared (Table 1; Figs. 3 and 4).
The size, location, and shape of ROIpeak all caused similar variations in intratumor response (Fig. 4). Within individual tumors, varying ROIpeak size, location, and shape resulted on average in respective SDs of 5%, 7%, and 5% associated with tumor response.
In general, the magnitude of intratumor response was independent of the size of ROIpeak. However, the variation in intratumor response tended to increase as the size of ROIpeak increased (Fig. 3). The magnitude and variation of intratumor response were independent of ROIpeak shape and location.
A strong correlation was exhibited between tumor response (average of all 24 SUVpeak responses for each tumor) and variation in intratumor response (r = 0.81, P < 0.001, Figs. 4–6). Variation in intratumor response tended to increase as response increased (i.e., as response worsened from partial response to stable disease to progressive disease). There was no significant correlation between tumor size and variation in intratumor response (Fig. 4, tumors ordered by size). Furthermore, there was no significant correlation between intratumor uptake heterogeneity (measured by CV of tumor uptake) and variation in intratumor response.
Population Average
SUVpeak
Quantification of the population average SUVpeak was substantially affected by changing the ROIpeak definition. For different ROIpeak definitions, the population average SUVpeak ranged from 24% above to 28% below the mean, resulting in a 14% CV (Supplemental Fig. 4). Differences in SUVpeak (associated with the ROIpeak definitions) between the populations were statistically significant (P < 0.001).
The size of ROIpeak caused more variation in the population average SUVpeak than did the location or shape of ROIpeak (Supplemental Fig. 4). Varying ROIpeak diameter resulted in a 13% CV associated with the population average SUVpeak, compared with a CV of 7% and 4%, respectively, when ROIpeak location or shape was varied. Trends observed in the population average SUVpeak reflected trends associated with intratumor SUVpeak. As the size of ROIpeak increased, the population average SUVpeak tended to decrease and its variation increased (Supplemental Fig. 4).
Tumor Response
Tumor response during treatment averaged –21% but ranged as high as +116% and as low as −80% (Fig. 4). However, population average tumor response was not significantly affected by changing the ROIpeak definition. For different ROIpeak definitions, the population average tumor response ranged from only 3% above to 3% below the mean, resulting in a 2% SD (Fig. 7). Differences in response (associated with the ROIpeak definitions) between the populations were not statistically significant (P = 1.00).
Size, location, and shape of ROIpeak all caused minimal variations in population average tumor response (Fig. 7), as all SDs were less than 2%. The magnitude and variation of population average tumor response were independent of the size, shape, and location of ROIpeak (Fig. 7).
Tumor Subgroup Analysis
Variation in SUVpeak and tumor response was determined using all 35 tumors assessed in this study. In addition, the results were recalculated on 2 different tumor subgroups. The first subgroup, in which lesions were in regions without significant background activity (n = 23), was studied in order to reduce the chance that elevated background activity was incorrectly included in ROIpeak. Consequently, abdominal, hepatic, renal, and bone lesions were excluded from this group. The second subgroup was one in which lesions were larger than 20 mL (n = 12), which is approximately 5 times larger than the largest ROIpeak (4.2 mL, 20-mm diameter). Studying this subgroup ensured that ROIpeak was completely inside the tumor boundaries and that no background activity was incorrectly included in ROIpeak. Results for the 2 tumor subgroups were almost identical to results determined using all tumors (Table 2).
DISCUSSION
Individual Tumor Response Versus Population Average Response
The region of interest used to determine SUVpeak can have a profound effect on its quantification and on the response of individual tumors. On average, different ROIpeak definitions resulted in intratumor variations of approximately 17% and 9% for SUVpeak and tumor response, respectively, and these variations ranged as high as 50%. This degree of variation can lead to different categorizations of response (Figs. 3 and 4) using criteria such as PERCIST (9). With PERCIST, such ambiguous response categorizations arose in over 40% of the tumor responses assessed in this study (Fig. 4; Table 1). Ambiguous response categorization of tumors increased with narrower response criteria (e.g., ±20%) but was reduced using broader response criteria (e.g., ±40%), similar to that of the MUNICON phase II trial (Table 1) (23). The sensitivity of response quantification to the ROIpeak definition reveals the need to optimize PET metrics (such as SUVpeak) for quantitative response assessment in individual patients. An ambiguous response classification underscores the necessity for a unique, consistent, standard region of interest with associated criteria that can accurately assess response.
Unlike individual tumor responses, population average response was relatively insensitive to the definition of ROIpeak used to measure response (Fig. 7), as is consistent with the findings of Krak et al. (6). The small variation (only 2%) in population average response occurred because the magnitude of individual tumor responses was independent of the ROIpeak definition. Therefore, because of an averaging effect, variation was reduced when determining population average response and might be reduced even further as more tumors are included in the population average. This robustness of population average response points to the strength of PET for accurate quantification of the average response to therapy.
Effects of Different Factors on Variation in Intratumor Response
The variation in intratumor response correlated strongly with tumor response (Fig. 5). Tumors that responded well (i.e., partial response, tumor response < –30%) exhibited significantly less variation in intratumor SUVpeak response than did tumors that responded poorly (i.e., stable disease or progressive disease, tumor response > –30%). Well-responding tumors seemed to exhibit a response more uniform than the heterogeneous response of poorly responding tumors (Fig. 6). Thus, SUVpeak–based response was considerably more sensitive to the ROIpeak definition for poorly responding tumors than for well-responding tumors.
Surprisingly, neither tumor size nor tumor uptake heterogeneity had a significant effect on the variation in either intratumor response or SUVpeak. This finding suggests that the characteristics (size, heterogeneity, etc.) of only the high-uptake regions encompassed by ROIpeak, not those of the entire tumor, directly affect the variation in tumor response and SUVpeak. Though not investigated, partial-volume effects tend to reduce uptake heterogeneity and therefore are expected to reduce the variation in both SUVpeak and response. Thus, a greater variation in both SUVpeak and response should result from partial-volume correction of the PET data.
The variable ROIpeak definition led to a variation in intratumor response that was about half that of SUVpeak. Tumor response was determined via normalization by baseline SUVpeak, in effect canceling out some of the variation in SUVpeak, which may explain the reduced variation in intratumor response. Most of the variation in SUVpeak was due to the size of ROIpeak, as is consistent with the findings of Boellaard et al. (5). As expected, variation in both intratumor response and SUVpeak increased as the size of ROIpeak increased.
For each ROIpeak definition, population average SUVpeak preserved the trends caused by the size, shape, and location of ROIpeak. Consequently, the variation in population average SUVpeak was approximately equal to the variation in intratumor SUVpeak. This result is in contrast to tumor response, in which the variation in intratumor response for different ROIpeak definitions (9%) was much larger than the variation in population average response (2%). For tumor response, there were no significant trends caused by size, shape, or location of ROIpeak, resulting in very little variation in population average response due to an averaging effect.
The wide variation in both intratumor response and SUVpeak stemmed from changes to the size, shape, and location of ROIpeak, reflecting the range of different ROIpeak definitions found in the literature. Therefore, a wide variation in intratumor response is expected under normal, realistic conditions. It is likely that an even greater variation would occur because of errors during image analysis for response assessment. For example, improper localization of ROIpeak in an average- or low-uptake region of a tumor at baseline would result in a measured tumor response that is artificially large, leading to a more extreme variation in intratumor response.
18F-FLT, rather than 18F-FDG, was selected as a radiotracer in this study because of the antiproliferative nature of the molecular targeted therapy. Furthermore, 18F-FLT may be more effective for assessment of treatment response than is 18F-FDG (21,25,26). However, imaging of tumors using both 18F-FLT and 18F-FDG has revealed a somewhat higher SUV and broader SUV range with 18F-FDG than with 18F-FLT (17,27,28). Thus, compared with 18F-FLT, 18F-FDG is expected to result in a greater variation in both SUVpeak and tumor response due to different ROIpeaks.
SUVpeak was determined using body weight (
Implications for Treatment Response Assessment
Currently, most response assessment studies use SUVmax, although recently SUVpeak has been recommended as a more robust alternative (9). Patient-specific response quantification is subject to significant uncertainty because of the different ROIpeak definitions, and therefore SUVpeak requires further study to optimize its use for quantification of response in individual patients. Though stemming from different causes, the uncertainties associated with SUVpeak and SUVmax are comparable (6). Moreover, the noise uncertainty associated with SUVmax continues to be reduced because of the increased counts with 3-dimensional PET acquisition, the current standard on most scanners. A correlation between SUVmax and SUVpeak responses has been demonstrated (6,29), and in this study, SUVmax response was within the range of responses quantified with SUVpeak in almost 70% of all tumors (Fig. 4). Nevertheless, despite this correlation, there can be substantial differences between SUVmax and SUVpeak responses in individual tumors. For example, response quantification using the PERCIST-recommended SUVpeak (1.25-cm-diameter sphere in highest-uptake region) was 45% smaller than that of SUVmax in tumor 9 (Fig. 4), resulting in different response categorizations. Such differences underscore the need to establish the relative predictive power of SUVpeak versus SUVmax for response assessment. Consequently, the recent recommendation in favor of SUVpeak over SUVmax should be approached with caution (9). It must first be determined whether SUVpeak or SUVmax is best suited for treatment response assessment.
Clinical trials are necessary to establish the superiority of SUVpeak or SUVmax for quantification of response to therapy. These trials should investigate the sensitivities of SUVpeak and SUVmax to a variety of factors, including image noise, scan acquisition and image reconstruction parameters, partial-volume effects, tumor motion, and others. Furthermore, the clinical utility of either SUVpeak or SUVmax for response quantification will strongly depend on its correlation with patients’ clinical outcomes. Ultimately, the most robust and predictive SUV measure should be selected for quantification of treatment response.
It is probably not feasible to compare all definitions of SUVpeak with SUVmax, within the context of a larger clinical trial. Rather, a standard ROIpeak should be carefully selected to determine SUVpeak. ROIpeak should be large enough to prevent SUVpeak from suffering from noise, partial-volume effects, and other sensitivities that plague SUVmax. However, ROIpeak should not be so large that it includes substantial uptake heterogeneity and voxels that lie outside the tumor. These considerations lend support to the 1.2-cm-diameter sphere recommended by PERCIST as a standard definition of ROIpeak (for 2-cm or larger diameter tumors). This size is in the middle of the range of ROIpeak definitions found in the literature.
Identification of a suitable SUV measure for response quantification requires clinical trials. After these trials, thresholds for the different response categories (complete response, partial response, stable disease, and progressive disease) can be established using population average response data in which the uncertainties are small. Unique thresholds may be established for specific diseases and their associated therapies. The size of the thresholds will need to exceed the overall uncertainty associated with the selected SUV measure (SUVpeak or SUVmax). Subsequently, the SUV measure could be quantified in individual patients to gauge their response to therapy using the established response thresholds as a guide.
CONCLUSION
Quantification of individual tumor response with SUVpeak is sensitive to the region of interest used to determine SUVpeak. Changes to the size, shape, and location of ROIpeak result in substantial variation (≤50%) in both SUVpeak and tumor response for individual tumors. These considerable uncertainties in SUVpeak and tumor response call into question recommendations favoring SUVpeak over SUVmax for quantification of treatment response. Clinical trials are necessary to compare the efficacy of SUVpeak and SUVmax for quantification of response to therapy.
DISCLOSURE STATEMENT
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
This work was financially supported by NIH grant R01 CA136927. We acknowledge the PET technologists Chris Jaskowiak and Mark McNall for scanning patients after hours as well as the University of Wisconsin Cyclotron Research Group for producing the 18F-FLT used in the study. No other potential conflict of interest relevant to this article was reported.
- © 2012 by the Society of Nuclear Medicine, Inc.
REFERENCES
- Received for publication May 19, 2011.
- Accepted for publication October 13, 2011.