Abstract
18F-FDG PET qualitative tumor response assessment or tumor-to-background ratios compare targets against blood-pool or liver activity; standardized uptake value (SUV) semiquantitation has artifacts and is validated by a stable normal-tissue baseline. The aim of this study was to document the normal intrapatient range of scan-to-scan variation in blood-pool SUV and liver SUV and to identify factors that may adversely affect it (increase its spread). Methods: Between July 2009 and June 2010, 132 oncology patients had 2 PET/CT scans. Patient preparation, acquisition, and reconstruction protocols were held stable, uniform, and reproducible. Mean SUV (body weight) values were obtained from 2-dimensional regions of interest in the aortic arch blood pool and in the right lobe of the liver. Results: Of the 132 patients, 65 had lymphoma. Their mean age was 62.5 y. The group’s mean serum glucose level was 6.0 mmol/L at the first visit and 5.9 mmol/L at the second visit. The mean 18F-FDG dose was 4.1 MBq/kg at the first visit and 4.0 at the second. At the first visit, the group’s mean blood-pool SUV was 1.55 (SD, 0.38); at the second, 1.58 (SD, 0.37)—not statistically different. The group’s mean liver SUV was 2.17 (SD, 0.44) at the first visit and 2.29 (SD, 0.44) at the second (P = 0.005). Visit-to-visit intrapatient variation in blood-pool and liver SUVs had gaussian distributions. The variation in blood-pool SUV had a mean of 0.03 and SD of 0.42. The variation in liver SUV had a mean of 0.12 and SD of 0.50. Using 95th percentiles, the reference range in our patient population for intrapatient variation was −0.8 to 0.9 for blood pool SUV and −0.9 to 1.1 for liver SUV. Subanalysis by cancer type and chemotherapy suggested that the rise in liver SUV between the 2 visits was largely due to the commencement of chemotherapy, but no factors were identified as systematically affecting intrapatient variation, and no factors were identified as increasing its spread. Conclusion: In our patient cohort, the reference range for intrapatient variation in blood-pool and liver SUVs is −0.8 to 0.9 and −0.9 to 1.1, respectively.
Quantitative measurement of tumor 18F-FDG uptake is used as a proxy for tumor biologic activity. Change in tumor 18F-FDG uptake is used as a proxy for tumor response to therapy or tumor progression. Absolute quantitation is not feasible in clinical practice, and semiquantitative methods are used to derive a single utilitarian numeric value. This single numeric value is usually a standardized uptake value (SUV), however calculated (1,2). Many technical factors affect SUVs and may lead to spurious variation (3,4). Standardization of semiquantitative measurement is the usual clinical method of validating semiquantitation. Standardization in its simplest form is a ratio of 18F-FDG uptake in tumor to that in normal background tissue, popularized as T/B (tumor-to-background) ratio. The ratio is dimensionless; it retains its meaning whether voxel values are encoded with arbitrary activity units or with SUVs. Modern PET processing has all but eliminated arbitrary units in favor of SUV maps, and T/B ratios are usually calculated as tumor SUV to background SUV (5,6). Tissues most commonly used as background in clinical practice and popularized in the literature are normal liver, blood pool, cerebellum, lung, and resting muscle (7).
Qualitative assessment of tumor activity has its own school of adherents and has been shown on many occasions to be as accurate in predicting clinical outcomes as is semiquantitation (8). Qualitative assessment relies on visual comparison of uptake in tumor with that in other tissues, with the most common comparison tissues being normal parent tissue surrounding the tumor, blood pool, normal liver, cerebellum, lung, resting muscle, and subcutaneous tissues. Scan-to-scan reproducibility and institution-to-institution portability of qualitative assessment relies on grading tumor uptake against a qualitative but discrete scale of uptake in normal tissues, arranged in a logical ascending order from least 18F-FDG–avid to most 18F-FDG–avid (9). An early but validated and enduring example of such a ladder scale is the Peter MacCallum Cancer Centre 18F-FDG grading system for characterization of solitary pulmonary nodules, which uses as its breakpoints the 18F-FDG uptake of any visibility, surrounding lung, and blood pool to create a 4-point scale. The scale is corrected for partial-volume artifacts in small nodules and is an accurate assessment tool in the Australian patient population (10).
Serial comparison of tumor activity (either semiquantitative or visual) implicitly assumes stability of normal-tissue 18F-FDG uptake in its premise of scan-to-scan comparability. Use of T/B ratios for serial comparison of tumor activity explicitly corrects for variations in normal background tissue.
Scan-to-scan stability of normal-tissue 18F-FDG uptake cannot be assumed even if technical factors confounding SUV calculation have been ruled out. Even under technically ideal circumstances, there must exist a range of 18F-FDG uptake variability in normal tissues in any single subject.
A considerable amount of work has been done to identify the range of 18F-FDG uptake for normal tissues and different tumor types (usually expressed in SUV units). However, relatively little work has been done or published to define the range of normal intrapatient scan-to-scan variability of 18F-FDG uptake in healthy, normal tissues (11). Once known, this variability is of clinical utility; any scan-to-scan change in 18F-FDG uptake falling within this variability range is unlikely to be clinically significant. Although this range can be reasonably defined only for normal tissues, the reliance of tumor assessment on comparison with normal tissues means that a scan-to-scan change in tumor 18F-FDG uptake that falls within this range should be treated with caution. Conversely, a scan-to-scan change in background 18F-FDG uptake that falls significantly outside this range should trigger a search for technical confounders and errors in SUV calculations.
We hypothesized that in any individual there exists a statistical variability in blood-pool SUV and liver SUV measured at different time points and that this variability constitutes a physiologic limit to the reproducibility of scan-to-scan blood-pool and liver SUV.
The aims of this work were to define the reference range of scan-to-scan variation in blood-pool and liver SUV for an individual patient and to identify which factors, if any, adversely affect scan-to-scan reproducibility by increasing the spread of scan-to-scan variation.
MATERIALS AND METHODS
Patients
Between July 2009 and June 2010 (inclusive), a total of 132 oncology patients underwent 2 or more PET scans using a Biograph 16 PET/CT scanner (Siemens AG). The following data were collected at each visit: patient data (age [y], weight [kg], height [cm], diabetes status [yes or no], and cancer group [1–12]), scan-related data (fasting interval [h], serum glucose level at time of 18F-FDG injection [mmol/L], total injected 18F-FDG dose [MBq], uptake time [defined as minutes from injection of 18F-FDG to commencement of imaging], and recent chemotherapy [yes or no, defined as chemotherapy within the last 6 wk]), and the mean blood-pool and liver SUVs.
At the second visit, the interval (days) from the first visit was recorded.
The cancer groups and patient number in each group are presented in Table 1.
Patient Preparation and Scan Acquisition.
All patients fasted either overnight (for morning scans) or for at least 6 h (3 of the 264 patients fasted only 4 h because they lived in remote areas and required same-day scans). All patients were instructed to avoid strenuous activity in the preceding 24 h, to keep warm, and to drink plain water.
Separate intravenous access was obtained for 18F-FDG injection in patients with central lines, catheters, or ports. Serum glucose was routinely measured before injection, and all patients with a level above 8 mmol/L (marginally high) were reviewed. Six had a level above 8 mmol/L, and only 2 had a level above 10 mmol/L (10.1 and 11.7). After injection with 4 MBq of 18F-FDG (range, 3–5 MBq) per kilogram of body weight, the patients rested comfortably in a heated uptake room, sleeping or watching television.
The uptake period was as close to 60 min as was possible in a busy clinical center. If oral contrast medium was required for subsequent diagnostic oncologic CT scanning, the contrast was drunk during the uptake period. Only negative oral contrast (1.3% methylcellulose in water), not positive oral contrast, was used.
The injected dose was calculated by measurement of the 18F-FDG activity in the syringe before and after the injection. Immediately before undergoing scanning, the patients voided.
After a topogram had been obtained to determine scan limits, a low-dose CT scan for attenuation correction and anatomic correlation was acquired (effective current, 75 mAs, with dose reduction [CARE Dose 4D; Siemens]; 3.0-mm slice collimation). Immediately after this scan, a whole-body PET emission scan was acquired. For malignancy in the head and neck area, the field of view was from the vertex to the upper thighs; for melanoma, from the vertex to the toes; and for all other malignancies, from the skull base to the upper thighs. Depending on patient height, this last field of view was achieved with 6 or 7 overlapping bed positions. At each bed position, emission data were acquired in 3-dimensional mode for between 3 and 4 min at the technologist’s discretion, with the aim of optimizing counts while minimizing on-bed time.
The longer time per bed position was used in larger patients to correct for greater body attenuation and maintain adequate information density.
To ensure the reproducibility of SUV measurements, a cross-calibration procedure between the PET scanner and the dose calibrator was performed at installation. The dose calibrator (CRC-25PET; Capintec Inc.) was calibrated at manufacture with sources traceable to National Institute of Standards and Technology standards. Weekly check testing was performed with a 137Cs source. At further annual intervals, the calibrator was compared with a tertiary reference standard (traceable to Australian Activity Standards Laboratory) to ensure the accuracy and precision of dose measurements to within ±10%. In addition to measurement of residual dose for each patient, a correction for decay was made if a delay of greater than 5 min between initial dose calibration and injection was experienced. Daily quality control procedures were performed on the PET scanner using a 68 Ge source, according to the manufacturer’s protocol.
Image Reconstruction and SUV Documentation.
All emission scans were processed with iterative reconstruction (ordered-subsets expectation maximization, 4 iterations, and 4 subsets). All CT scans were processed with filtered backprojection. The emission dataset was a volume of 3-mm voxels. The CT dataset was a volume of 0.6-mm voxels. The CT dataset was reformatted to 3-mm voxels and used to generate an attenuation-corrected PET volume. The simplest SUV calculation method (weight-based SUV) was used for all patients:
In the remainder of this paper, blood-pool SUV and liver SUV will be used as abbreviations for the mean body weight–based SUV of the blood pool and of the liver, respectively.
Ethics Considerations
All data presented in this study were collected contemporaneously and used in direct patient care at the time of patient attendance. No data were collected specifically for the purposes of this study. All the patient identifiers were stripped from the dataset collected into this study, with patient privacy maintained at the level of the patient record embedded into the Radiology Information System of the department. The data analysis did not generate any additional information of utility to any one patient. Within these parameters, this study met the institution criteria for minimal-impact observational research and did not require a formal ethics committee approval.
Statistical Analysis
The data for each patient were collected into an Excel 2003 (Microsoft Corp.) table. The data were transferred into Stata statistical analysis software (version 8.2; StataCorp LP). First, the mean SUV of blood pool and of liver for each visit was treated as a group, and the 2 groups were compared (paired Student t test). The SUVs at each visit were graphed for the group as a scatterplot, looking for graphically detectable outliers or trends. The intrapatient visit-to-visit difference in mean blood-pool and liver SUVs was then calculated as the absolute difference between visit 2 and visit 1, and descriptive statistics of blood pool and liver variability were generated for each. Second, a linear regression model was constructed for the variability in blood-pool and liver SUVs to identify which of the captured patient-related or scan-related factors had an association with increased variability or its direction.
RESULTS
Patient Cohort Parameters
The total number of patients in this study was 132. The number of patients in each cancer group is presented in Table 1. The largest group by far was lymphoma, with 65 patients returning for a second PET scan within the 12 mo of the study. Patient age range was 19–90 y; mean age was 62.5 y. Eleven patients were diabetic. Group mean serum glucose level was 5.9 mmol/L (range, 3.4–11.7) at the first visit and 6.0 mmol/L (range, 4.6–10.1) at the second. There was no statistical difference between the 2 visits (at α = 0.05). The mean 18F-FDG dose per kilogram of body weight was 4.1 MBq/kg at the first visit (range, 1.5–5.7 MBq/kg) and 4.0 MBq/kg at the second visit (range, 1.9–7.2 MBq/kg). There was no statistical difference in dose/kg between the 2 visits (at α = 0.05). Uptake period data were available for 125 of 132 patients. At the first visit, mean uptake time was 69.5 min (SD, 21.8 min) and at the second, mean uptake time was 70.0 min (SD, 20.7 min). There was no statistically significant difference between uptake times between the 2 visits (at α = 0.05).
Group Mean Values for Blood Pool and Liver SUV
At the first visit, the group’s mean blood-pool SUV was 1.55, with an SD of 0.38. At the second visit, the group’s mean blood-pool SUV was 1.58, with an SD of 0.37. There was no statistically significant difference between blood-pool SUV at the first and second visits. For our group of patients and for our scanner and method combination, the reference range for blood-pool SUV can be considered to be 0.8–2.3 (mean ± 2 SDs). A rounded clinically useful value for average blood-pool SUV based on our cohort is 1.5.
At the first visit, the group’s mean liver SUV was 2.17, with an SD of 0.44. At the second visit, the group’s mean liver SUV was 2.29, with an SD of 0.44. The difference of 0.12 was statistically significant (2-tailed P = 0.005) even though the 95% confidence intervals for the 2 means overlapped. Using the mean of visit 2 ± 2 SDs gives a reference range of 1.4–3.2 for liver SUV. A rounded clinically useful value for average liver SUV based on our cohort is 2.3.
Subanalysis based on the pattern of chemotherapy change and on cancer group showed that in patients commencing chemotherapy between visits 1 and 2 (n = 63), liver SUV rose by 0.25 units (range, −1.1 to 2.4). This was more prominent in lymphoma patients commencing chemotherapy (n = 36), in whom the mean rise was 0.29 units (range, −1.1 to 2.4).
In nonlymphoma patients commencing chemotherapy (n = 27), the mean rise was 0.18 units (range, −0.6 to 1.2).
Patients on no chemotherapy at either visit (n = 28) had no appreciable change in liver SUV; there was a decrease of 0.05 units. There was no difference between lymphoma and nonlymphoma patients.
Patients discontinuing chemotherapy between the 2 visits showed a decrease in liver SUV of 0.22 for lymphoma (n = 6 patients; range, −0.8 to 0.2) and a nonsignificant rise of 0.04 for nonlymphoma (n = 5 patients; range, −0.6 to 0.9).
Patients who were on chemotherapy at both visits (n = 30) showed a rise in liver SUV of 0.11 (range, −0.5 to 1.1), with no appreciable difference between lymphoma and nonlymphoma patients.
Intrapatient Variation for Mean SUV of Blood Pool and Mean SUV of Liver
The visit-to-visit intrapatient variation in blood-pool SUV in our patient cohort had a predominantly gaussian distribution (Fig. 3). The mean of the variation dataset was 0.03. The SD of the variation dataset was 0.42. Using the fifth to 95th percentiles of the dataset, the reference range for visit-to-visit variation based on our patient cohort is −0.8 to 0.9 SUV units.
The visit-to-visit variation in liver SUV was also distributed in a gaussian pattern (Fig. 4). The mean of the dataset was 0.12, with an SD of 0.50. The modest but statistically significant rise can be attributed mostly to the 63 patients commencing chemotherapy between the 2 visits. The average liver SUV in this subgroup rose by 0.25 units by visit 2.
Taking the fifth and 95th percentiles as the reference range boundaries gives a reference range for variability in liver SUV of −0.9 to 1.1.
Modeling for Variation
An attempt was made to construct a linear regression model for blood-pool SUV and for liver SUV to identify parameters that may predict the change in either. The parameters tested were patient age, weight, and height; patient diabetes status and cancer group (1–12); fasting interval (h); serum glucose level at time of 18F-FDG injection; total injected 18F-FDG dose; recent chemotherapy (yes or no); and time interval between the 2 scans.
No contributor could be identified among the parameters tested to generate a model with sufficient predictive power to be useful.
We also attempted to show an association between the rise in liver SUV in lymphoma patients commencing chemotherapy and the drop in total tumor burden. In 20 of these 36 patients there were sufficient data available to calculate total tumor burden at each of the 2 visits (defined as total tumor volume × mean SUV of the tumor). We correlated the drop in tumor burden in these 20 patients (i.e., tumor burden at visit 2 minus tumor burden at visit 1) and the change in liver SUV between the 2 visits. A linear regression analysis between these 2 values showed a statistically nonsignificant association (P = 0.1266). This additional analysis was performed with the R program (R Foundation for Statistical Computing).
DISCUSSION
Knowing the range of variation of baseline values in 18F-FDG PET is of unquestionable clinical utility. This variation sets limits on the precision of tumor response assessment using 18F-FDG PET. In this study, we looked at the 2 values most commonly used as a baseline normal reference—the blood pool and liver. In our cohort of 132 returning patients, we derived a reference range for expected variability in SUV (mean, body weight–based) for the blood pool and liver. Assuming the conventional cutoffs of the fifth and 95th percentiles, in our patient cohort the reference range for normal variability was −0.8 to 0.9 for blood-pool SUV and −0.9 to 1.1 for liver SUV.
The gaussian distribution of the variability supports the possibility that this variability represents statistical measurement-to-measurement patient variability. It also lends credibility to the possibility that the reference ranges are a true representation of this normally distributed variability.
Both qualitative and T/B ratio methods of tumor activity assessment rely on comparison with normal tissues, most commonly liver and blood pool. For these methods, the variability range affects the denominator of the comparison and introduces a range of physiologic uncertainty. If, for example, a liver metastasis has an SUV of 10 in a normal liver background with an SUV of 2.5, the T/B ratio is 4. At the subsequent (postchemotherapy) visit, tumor SUV dropped to 6. If the liver SUV drops to 2, the T/B ratio is 3; if the liver SUV remains 2.5, the T/B ratio is 2.4; if the liver SUV rises to 3.0, the T/B ratio is 2.
It can be argued that using absolute target SUVs for linear time point comparison will not be susceptible to the wide variation inherent in T/B ratios. In such a case, however, a change in blood-pool SUV and liver SUV well outside the reference range for statistical physiologic variation is a clue to a confounder affecting all SUV calculations for that particular scan or else the existence of a systematic bias or drift in SUV technique. Reliance on target SUV alone has many well-documented technical pitfalls. Even when technical error can be excluded in SUV calculations, it appears prudent to be cautious in assuming a clinically significant response in tumors whose change in SUV is comparable to the reference range for normal background variation.
In our patient cohort there was focused attention on the technical details of SUV calculation and exclusion of all known technical confounders; the acquisition, reconstruction, and region-of-interest derivation parameters were kept fixed during the study. The range of variability we observed was likely to be a realistic lower boundary for reproducibility under clinical conditions in a service department.
To have clinical utility outside their own parent unit, findings need to be generalizable. In our study, we provided a detailed description of our acquisition methodology to allow readers to assess the similarity of their own acquisition parameters and the applicability of our data.
The limitations of our study were several. A cohort of 132 patients is relatively small but still generated a credible reference range for normal variation. Our case mix consisted of roughly equal numbers of lymphoma cases to nonlymphoma cases. Other malignancies were less well represented. We could not derive reference ranges for subsets of patients, particularly by cancer type, nor could we undertake a comparison of the range of normal background variability in different cancers. Such a study would require extensive case-control matching for patient demographics and chemotherapy between different tumor streams to ensure that any differences, however slight, represent the effect of a primary tumor difference.
Our patient case mix, in the first approximation, is a reasonable representation of Australian oncology patients undergoing serial PET comparison at the time of writing. Our reference ranges may not be immediately applicable to other countries or more specialized patient populations. For our reference ranges to be applicable in similar patient populations, acquisition and reconstructions parameters need to be comparable.
The linear regression model that was used to look for predictors in blood-pool and liver SUV variability has not identified any statistically meaningful predictors. Yet, there was a statistically significant increase in liver SUV from visit 1 to visit 2. Subanalysis of chemotherapy status suggested that the commencement of chemotherapy for lymphoma is largely responsible for this effect in our patient cohort. We hypothesized that this change may be due to a change in the sump effect of large-volume, 18F-FDG–avid lymphoma masses that reduce the 18F-FDG available for liver uptake. With a successful chemotherapy response, relatively more 18F-FDG would be taken up in the liver. We were unable to find evidence to support this hypothesis in our 20 lymphoma patients commencing chemotherapy for whom tumor burden subanalysis was possible.
Regardless of the factors contributing to this systematic rise in liver SUV between the 2 visits, the rise was numerically small compared with the intrapatient variability: the rise in the group’s mean liver SUV was 0.12 whereas the SD for intrapatient variability for liver was 0.5. When compared with the reference range (fifth–95th percentiles in our patient population) of −0.9 to 1.1, this rise is numerically minor. It is also unlikely to be clinically significant.
CONCLUSION
Scan-to-scan intrapatient variation in mean blood-pool and liver SUVs has a gaussian (normal) distribution. This variation is likely to be physiologic and presents a limitation on reproducibility of normal-tissue baseline activity. In our group of 132 oncology patients, intrapatient variation in blood-pool SUV had a mean of 0.03 and an SD of 0.42. Intrapatient variation in liver SUV had a mean of 0.12 and an SD of 0.50. Using 95th percentiles, the reference range for intrapatient variation in SUV is −0.8 to 0.9 in the blood pool and −0.9 to 1.1 in the liver. These ranges are applicable in clinical practice provided that patient groups and acquisition and reconstruction protocols are comparable to those for our patient cohort. These ranges indicate a practical limitation on the precision of tumor response assessment using 18F-FDG PET.
DISCLOSURE
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734. No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Mar. 19, 2013.
- © 2013 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
REFERENCES
- Received for publication May 9, 2012.
- Accepted for publication November 21, 2012.