Abstract
The aim of this study was to evaluate the test-retest variability of standardized uptake values (SUVs) in normal tissues and the impact of various methods for measuring the SUV. Methods: SUVs were determined in 70 cancer-free patients (40 female and 30 male) on 2 occasions an average of 271 d apart. Mean values for body weight and height, blood glucose level, injected dose, and uptake period did not change between the 2 groups of studies. Four regions of interest (ROIs) were placed—on the liver, lung, mediastinum, and trapezius muscle. Mean and maximum SUVs normalized for body weight were obtained, and normalizations were then applied for lean body mass (LBM), LBM and blood glucose level, body surface area (BSA), and BSA and blood glucose level. Results: In the lungs and muscle, metabolic activity within the ROIs was significantly different in the 2 studies, no matter which method was used for the SUVs. The differences ranged from 0.02 to 0.1 for SUV normalized for body weight and SUV normalized for LBM and from 0.001 to 0.002 for SUV normalized for BSA. In the liver, results were similar for all SUVs, except for maximum SUV corrected for LBM and maximum SUV corrected for LBM and blood glucose level. The metabolic activity measured in the mediastinum was also comparable in the 2 studies, regardless of the type of SUV. When investigating whether any normalization method for SUVs reduces variability and improves test-retest concordance, we found no significant superiority for any. The best intraclass correlation coefficients were obtained with the SUV normalized for body weight, in both the liver and the mediastinum, but the coefficients of variation were similar for all 3 mean SUVs that were not corrected for glucose level (range, 10.8%–13.4%). However, normalizing for blood glucose level increased the variability and decreased the level of concordance between studies. Conclusion: The SUVs measured in normal liver and mediastinum in cancer-free patients are stable over time, no matter which normalization is used. Correcting for blood glucose level increases the variability of the values and should therefore be avoided. Normalizing for BSA or LBM does not improve the reproducibility of the measurements.
The increased glucose metabolism of cancer cells is used to detect neoplasms with 18F-FDG PET imaging, for either initial staging, posttreatment follow-up, or detection of suspected recurrence. In clinical routine, images are analyzed either qualitatively, using visual comparison of the metabolism in lesions versus normal tissue, or semiquantitatively, using standardized uptake values (SUVs). SUV is defined as the tissue concentration (MBq/mL) divided by activity injected per body weight (MBq/g). Many authors have discussed the multiple factors affecting the SUV, such as weight, plasma glucose level, length of uptake period, partial-volume effects, and recovery coefficient (1–5).
To our knowledge, there has been no published report of the test-retest within-patient variability of SUV in normal tissues. The purpose of this work was to evaluate this issue and to verify whether one correction method for SUV is more reproducible than the others. Both the qualitative and the quantitative evaluation of 18F-FDG PET images could be affected.
MATERIALS AND METHODS
Patients
The data on 70 patients (40 female and 30 male; mean age ± SD, 50 ± 16 y) who underwent 2 18F-FDG PET examinations with normal findings (271 ± 118 d apart) were analyzed retrospectively. The patients were referred for follow-up of lymphoma (41%), melanoma (39%), lung carcinoma (10%), gastrointestinal carcinoma (5%), renal carcinoma (3%), or ovarian carcinoma (2%). None of the patients received anticancer treatment within 3 mo before the studies or experienced tumor recurrence between the 2 studies. All studies were reviewed by 2 experienced nuclear medicine physicians, fully aware of all clinical data, and the results were found to be rigorously normal.
Average body weight (70 ± 12 kg) and height (168 ± 10 cm) were stable between the 2 studies. The change in weight was 0.7 ± 4.2 kg between studies. Only 9 of 70 patients had changes exceeding 10% of their initial weight (13% of the population sample). The blood glucose level was measured in all patients before injection. The average glucose level was 4.8 ± 0.9 mmol/L (3.2–9.6 mmol/L) for study A (test) and 4.9 ± 0.9 mmol/L (3.6–8.3 mmol/L) for study B (retest) (P = 0.29).
18F-FDG PET Studies
The dose injected and the uptake period of 18F-FDG were 185 ± 28 MBq (range, 148–244 MBq) and 67 ± 10 min (range, 49–95 min), respectively, for study A and 183 ± 28 MBq (range, 148–262 MBq) (P = 0.59) and 69 ± 9 min (range, 53–102 min) (P = 0.33), respectively, for study B. All patients were imaged from the base of the skull to the proximal thigh, using the C-PET scanner (UGM-Philips), which has been fully described elsewhere (6).
Attenuation correction using a 137Cs source was applied to each study. Each emission scan and transmission scan lasted 5 min and 1 min, respectively. Most patients with lymphoma and melanoma received 5 mg of diazepam orally approximately 15 min before 18F-FDG injection. There was no difference between the numbers of diazepam doses given on the 2 occasions. Images were reconstructed using the ordered-subsets expectation maximization algorithm and were corrected for decay, scatter, random events, and attenuation.
Image Analysis
Four regions of interest (ROIs) were placed—on the liver (central region of right lobe), lung (basal region of the right lung, at a distance from the diaphragm), mediastinum (upper region, at the level of the large vessels), and trapezius muscle—at similar levels for each study. These ROIs were placed on a single slice and were of similar size (19.2 cm3 for the liver, 5.6 cm3 for the mediastinum, 13.8 cm3 for the lung, and 4.5 cm3 for the muscle). Mean and maximum SUVs were calculated using the formula: where SUVBW is SUV normalized for body weight. Corrections were then applied for lean body mass (LBM) (7) and body surface area (BSA) (8) with these formulas: and where SUVBSA and SUVLBM are SUV normalized for BSA and LBM, respectively. These values were then corrected for blood glucose level, assuming a normal blood glucose level of 5.55 mmol/L (100 mg/dL) (9): where SUVG is SUV normalized for blood glucose level. The PET scanner efficiency was verified as part of the routine quality-control checks of our center. We performed phantom studies every month to verify the validity of the calibration factor. Our quality-control protocol called for the acquisition of new calibration tables when the offset exceeded 5%, compared with the reference value, but such acquisition was in fact performed systematically twice a year because few fluctuations were actually observed.
Statistical Analysis
Results are expressed as mean ± SD. Mean differences between the 2 studies (test and retest) for each SUV and each organ were assessed using a paired Student t test. Furthermore, the within-individual variation for each SUV (SDw) was calculated using the following formula: SDw = √ ∑d2/2n, where d represents the difference between test and retest values for each subject and n the number of subjects. SDw was then expressed in terms of coefficient of variation (CV) as follows: CV (%) = SDw/M, where M denotes the average value of test and retest means; that is, M = (mean test + mean retest)/2. In addition, to estimate the degree of agreement between test and retest measures, we computed the intraclass correlation coefficient (ICC). The closer the ICC to 1, the better the agreement. A lower 95% ICC confidence boundary was also calculated to test the statistical significance of the observed ICC. To compare test-retest differences between SUV measurement methods, a general linear mixed-model approach was used to account for repeated measurements on the study subjects. All results were considered to be significant at the 5% critical level (P < 0.05). Calculations were performed using version 8.2 of the statistical package of SAS Institute Inc.
RESULTS
All SUVs are given in Table 1 (average SUVs) and Table 2 (maximum-pixel-value SUVs), along with the mean differences between the 2 studies. In lungs and muscle, metabolic activity within the ROIs was significantly different in studies A and B, no matter which correction method was used for the SUVs. Differences in the lungs ranged from 0.02 to 0.09 for SUV normalized for body weight and SUV normalized for LBM and from 0.001 to 0.002 for SUV normalized for BSA. In muscle, differences ranged from 0.04 to 0.1 for SUV normalized for body weight and SUV normalized for LBM and from 0.001 to 0.002 for SUV normalized for BSA. In liver, the only SUVs that differed significantly between the 2 studies were maximum SUVs normalized for LBM and for LBM and blood glucose level. The metabolic activity measured in the mediastinum was similar in the 2 studies, regardless of the type of SUV. Although the average uptake time was not significantly different between the 2 series of studies, it did show some individual variations. The change in time was 1.5 ± 12.9 min, and no correlation was found between change in time and any of the SUVs.
To evaluate the impact of the method of measurement on the variability of the results, we focused our further analyses on the mediastinum and the liver, for which the metabolic activity remained stable between studies A and B. The CVs (Table 3) ranged from 11% to 19% for the liver and from 12% to 19% for the mediastinum. The lowest values were obtained with mean SUVs normalized for body weight, for BSA, and for LBM (CV = 11%). The values were slightly higher for the corresponding maximum pixel values and significantly higher when correction for blood glucose level was applied.
The ICC was significantly positive for all types of SUVs, although agreement was only fair, at best (Table 3). In the liver, the highest values for the ICC were obtained with mean SUV normalized for body weight and maximum SUV normalized for body weight or for LBM (ICC > 0.6). In the mediastinum, only mean SUV normalized for body weight and maximum SUV normalized for body weight showed an ICC > 0.6.
The general linear mixed-model analysis did not evidence any significant advantage or disadvantage for any of the SUVs (normalized for body weight or LBM, average or maximum pixel value, corrected for blood glucose level or not).
DISCUSSION
The measure of SUV is widely used either to categorize a lesion as malignant or benign or to stage and monitor cancer with 18F-FDG PET scanning. However, many well-known factors can affect the accuracy of SUV measurement, including patient weight, blood glucose level, length of uptake period, partial-volume effect, recovery coefficient, and type of ROI (1–5). In view of the variation of 18F-FDG plasma clearance during chemotherapy, some authors have proposed the use of tumor-to-background ratio as an adjunct (2) to monitor the reliability of SUV. Despite these factors, the reproducibility of SUV in tumor tissue within a 10-d period has been estimated at ±10% (10).
The purpose of the present work was 2-fold: first, to evaluate whether the metabolic activity of various normal tissues remains stable over time in a single patient population, and second, to assess the SUV variability associated with the different corrections and normalizations that can be applied.
We found that, on average, all SUVs in the mediastinum and most SUVs in the liver remained stable over time. We used a statistical methodology slightly different from that of Minn et al. (10), but the CV, which we calculated, and the percentage difference, which Minn et al. calculated, measure the same parameter and provide almost the same values. In our population, we found similar variations in the liver and mediastinum (11% and 12.3%, respectively), as Minn et al. did in tumors (10%). On the other hand, all mean SUVs in the lungs and muscle significantly differed between the 2 studies. Even though these differences were statistically significant, their clinical relevance remains questionable. Regarding mean SUV, the maximum difference observed was 0.05 ± 0.17 in lung and 0.07 ± 0.21 in muscle. In addition, several factors may contribute to these findings: The pulmonary or cardiovascular status of these patients may change during 2.5–16 mo. These changes may in turn reflect on 18F-FDG uptake in the lung. As for muscle uptake, it may be influenced by the patients’ stress or anxiety at the time of injection and by other factors such as room temperature. A patient’s being more accustomed to PET examination may contribute to lowering the average uptake on the second scan, thus explaining why most SUVs decreased from study A to study B. Also, because the lung and muscle regions have low 18F-FDG metabolism, they are more easily affected by statistical noise. At high SUVs, the measurements reflect mainly the phosphorylated fraction of 18F-FDG, but at lower SUVs, the proportion of nonphosphorylated 18F-FDG is higher, which in turn contributes to higher variability. In any case, our study showed that if tumor-to-background activity ratios are to be used, reliance on the liver and mediastinum, rather than the lung or muscle, for measuring background activity is to be preferred. However, although this observation appears valid in cancer-free patients, it needs to be confirmed in cancer patients.
The SUV calculation includes a calibration factor that takes into account scanner efficiency (image counting rate per voxel vs. MBq/mL). Because scanner efficiency may vary significantly over time, this issue may influence the SUVs and contribute to their variability, independently of any variation in the true metabolic activity. This issue was not a factor in the present study, as scanner efficiency was found to be stable over time. The normalization tables were renewed according to our routine quality-control guidelines, that is, on a systematic basis rather than as a result of changes observed in the phantom studies. Indeed, SUV changes did not exceed 5% for any of the phantom studies during this investigation.
To evaluate the impact of the method of measurement on reproducibility of values, the statistical analysis focused on tissues that did not change between the 2 studies, that is, mediastinum and liver. The mean SUVs observed in these organs were within the range of those previously described in the literature for a study using a comparable method of measurement (iterative reconstruction with segmented attenuation correction) (4). The liver SUV has previously been described as being constant between patients (4,11), but to our knowledge, the present study was the first attempt to evaluate the long-term variability of SUV in normal tissues.
When looking at which of the correction methods for SUV reduces variability and improves test-retest agreement, we found none to be significantly superior. The best ICCs were obtained for SUV normalized for body weight, in both the liver and the mediastinum, but the CVs were similar for all 3 mean SUVs that were not corrected for glucose level. We did find, however, that normalizing for blood glucose level increased the variability and decreased the level of concordance between studies, possibly in relation to the low variability of plasma blood glucose between the 2 studies and to the absence of extreme values of blood glucose level in our patient population. The same phenomenon was reported by Menda et al. (9). We also observed a trend for the mean SUV to be better than the maximum SUV, but this held true only for CV. No significant difference was seen for ICC or when performing the general linear mixed-model analysis. In fact, a higher variability of maximum SUV could be expected since it is more affected by statistical noise than is mean SUV.
Our results tend to support those of Menda et al. (9), who showed no benefit to any correction method in lung tumors. Some authors have reported better results with SUV normalized for BSA in tumor tissues (12–16), whereas others found SUV normalized for LBM to be better in normal tissues (8,11). However, the clinical superiority of one SUV method over another observed for tumor metabolic assessment does not necessarily imply similar conclusions for the reproducibility of metabolic measurements in normal tissue. On the other hand, we did not observe better agreement among SUVs normalized for BSA or LBM in our population. The subjects’ body weight did not change significantly from one study to the other. Body weight increased or decreased by 10% or more in only 9 patients. Such a small number does not permit any meaningful statistical analysis. Obviously, in a population sample whose body height and weight remain fairly stable over time, one cannot expect significant differences between the various methods of SUV measurement. However, during and after chemotherapy for cancer, patients can lose significant weight, so that such normalizations may become significant. In further evaluating this issue, it would be interesting to verify these results in cancer patients before and after treatment.
CONCLUSION
SUVs measured in normal liver and mediastinum in cancer-free patients are stable over time, no matter which correction method of SUV is used. Correcting for blood glucose level increases the variability of the values and should thus be avoided. Normalizing for BSA or LBM does not improve the reproducibility of the measurements.
Acknowledgments
Part of this work was presented at the European Association of Nuclear Medicine 16th Annual Congress, Amsterdam, The Netherlands, August 23–27, 2003.
Footnotes
Received Sep. 23, 2003; revision accepted Dec. 12, 2003.
For correspondence or reprints contact: Roland Hustinx, MD, PhD, Division of Nuclear Medicine, University Hospital of Liège, Sart-Tilman B35, 4000 Liège 1, Belgium.
E-mail: rhustinx{at}chu.ulg.ac.be