Test–Retest Reproducibility of 18F-FDG PET/CT Uptake in Cancer Patients Within a Qualified and Calibrated Local Network

Brenda F. Kurland; Lanell M. Peterson; Andrew T. Shields; Jean H. Lee; Darrin W. Byrd; Alena Novakova-Jiresova; Mark Muzi; Jennifer M. Specht; David A. Mankoff; Hannah M. Linden; Paul E. Kinahan

doi:10.2967/jnumed.118.209544

Abstract

Calibration and reproducibility of quantitative ¹⁸F-FDG PET measures are essential for adopting integral ¹⁸F-FDG PET/CT biomarkers and response measures in multicenter clinical trials. We implemented a multicenter qualification process using National Institute of Standards and Technology–traceable reference sources for scanners and dose calibrators, and similar patient and imaging protocols. We then assessed SUV in patient test–retest studies. Methods: Five ¹⁸F-FDG PET/CT scanners from 4 institutions (2 in a National Cancer Institute–designated Comprehensive Cancer Center, 3 in a community-based network) were qualified for study use. Patients were scanned twice within 15 d, on the same scanner (n = 10); different but same model scanners within an institution (n = 2); or different model scanners at different institutions (n = 11). SUV_max was recorded for lesions, and SUV_mean for normal liver uptake. Linear mixed models with random intercept were fitted to evaluate test–retest differences in multiple lesions per patient and to estimate the concordance correlation coefficient. Bland–Altman plots and repeatability coefficients were also produced. Results: In total, 162 lesions (82 bone, 80 soft tissue) were assessed in patients with breast cancer (n = 17) or other cancers (n = 6). Repeat scans within the same institution, using the same scanner or 2 scanners of the same model, had an average difference in SUV_max of 8% (95% confidence interval, 6%–10%). For test–retest on different scanners at different sites, the average difference in lesion SUV_max was 18% (95% confidence interval, 13%–24%). Normal liver uptake (SUV_mean) showed an average difference of 5% (95% confidence interval, 3%–10%) for the same scanner model or institution and 6% (95% confidence interval, 3%–11%) for different scanners from different institutions. Protocol adherence was good; the median difference in injection-to-acquisition time was 2 min (range, 0–11 min). Test–retest SUV_max variability was not explained by available information on protocol deviations or patient or lesion characteristics. Conclusion: ¹⁸F-FDG PET/CT scanner qualification and calibration can yield highly reproducible test–retest tumor SUV measurements. Our data support use of different qualified scanners of the same model for serial studies. Test–retest differences from different scanner models were greater; more resolution-dependent harmonization of scanner protocols and reconstruction algorithms may be capable of reducing these differences to values closer to same-scanner results.

Quantitative ¹⁸F-FDG PET/CT can measure molecular changes at multiple tumor sites and has been used to evaluate early response to cancer therapy (1). Biologic variability, such as body weight, glucose levels, and lesion location, is a fundamental source of ¹⁸F-FDG SUV_max quantitation error that cannot be controlled. However, other sources of variability related to patient preparation, image acquisition, and scanner calibration can be controlled and minimized. Previous same-scanner test–retest studies have achieved average variability of 10%–12% (2,3). Scanner qualification (4) and standardization of patient preparation and imaging protocols (5,6) may reduce measurement error. Consistency in scanner protocol parameters such as uptake time, image reconstruction, and scanner maintenance may limit machine error to less than 10% (7–9), but inconsistent or nonoptimized protocols can add error ranging from 18% to more than 40% (7,10,11). In addition, deviations from standards are common even under the scrutiny of a test–retest study (12–14). Measurement error and bias in quantitative PET measures will influence sample size and other study characteristics (15–18).

Published patient test–retest studies have used the same scanner (2,3,13,14), or scanners at the same institution from the same manufacturer (19); guidelines for using ¹⁸F-FDG PET/CT to assess response to therapy in multicenter trials strongly recommend using the same scanner for serial measurements (6,20). Allowing serial measurements from different PET/CT scanners would remove a barrier to accrual. For example, a second pretreatment scan could be avoided if a diagnostic scan from a community site could be used as the baseline scan for a phase I study (where intensive monitoring requires treatment at an academic site). However, allowing serial scans from different sites would require prospective multicenter validation of ¹⁸F-FDG uptake quantification. We have described a rigorous qualification process using National Institute of Standards and Technology–traceable reference sources for scanners and dose calibrators (21). This study assesses differences in SUV in tumors and in normal liver from test–retest studies scanning patients with the same or different scanners or sites uniformly calibrated and following a similar imaging protocol. We hypothesize that this approach will yield acceptable levels of test–retest precision in ¹⁸F-FDG uptake measures.

MATERIALS AND METHODS

Multicenter Consortium and PET/CT Scanner Qualification

Five scanners were used from within the University of Washington Medical Center/Seattle Cancer Care Alliance network: 2 GE Healthcare Discovery STE PET/CT scanners (“same model”); and a Philips Gemini TF 64, a Siemens Biograph 6 and a Siemens Biograph 20 mCT at network sites. Scanner characteristics are listed in Table 1. Before patient scans, sites underwent qualification. Scanner and dose calibrator performance were assessed with repeat measurements of National Institute of Standards and Technology–traceable, long-lived reference sources (⁶⁸Ge, half-life of 271 d) (21,22). In each round of measurements, a cylindric scanner source (phantom) was scanned using a clinical whole-body protocol, and a smaller source was measured in the dose calibrator using ¹⁸F-FDG settings. Performance measurements were completed every 3 mo and submitted for assessment of signal bias. Scanners were considered qualified after 3 successive rounds of measurements showed stable bias (< ∼5% variation). Details of ⁶⁸Ge/⁶⁸Ga PET dose calibrator and scanner cross-calibration kit and scan results are reported elsewhere (21).

View this table:

TABLE 1

Scanning Protocol Characteristics

In addition to scanner qualification, a nuclear medicine technologist traveled to each site to observe patient preparation protocols. Sites agreed to adhere to clinical protocol guidelines (similar to the eventual Uniform Protocols for Imaging in Clinical Trials [UPICT] protocol (6)) for parameters that might affect SUV bias, such as time between injection and image acquisition, patient fasting requirements, and injected dose (Supplemental Table 1; supplemental materials are available at http://jnm.snmjournals.org).

Patient Eligibility

Patients with pathologically confirmed solid malignancies who were undergoing an ¹⁸F-FDG PET/CT scan for tumor staging or restaging were eligible for the study. Patients were required to have either no cancer treatment at the time of imaging or chronic treatment that had not changed for at least 3 mo. Study enrollment and informed consent were required for the second scan, since the first scan was clinically indicated. This study was approved by the institutional review committee for imaging at all network sites, and all patients in the study signed an informed consent form.

¹⁸F-FDG Imaging

Two ¹⁸F-FDG PET/CT scans were scheduled on 2 separate days within 2 wk. The location of the second scan was dependent on scanner availability and the patient’s willingness to travel to another site. ¹⁸F-FDG dose (259–407 MBq recommended) was measured in a dose calibrator. The injection syringe and intravenous catheter were measured for residual activity after removal. The emission scan was started at 1 h ± 10 min after injection.

Image Analysis

A single certified nuclear medicine physician reviewed each image set, recording anatomic location, SUV_max, and slice location of the SUV_max pixel for each lesion. A second nuclear medicine radiologist verified each lesion location. Discrepancies (surgical inflammation; additional lesions to report) were resolved by consensus before data analysis. Cubic regions of interest of 3 × 3 pixels were drawn over the portion of each identified lesion with the most uptake on 3 consecutive slices (for measuring tumor SUL_peak). Up to 25 lesions were analyzed, selecting the most ¹⁸F-FDG–avid. A spheric region of interest (3-cm diameter) drawn on the liver (right lobe) assessed SUV_mean in normal soft tissue (1).

Statistical Analysis

For each region of interest, both difference in uptake (Eq. 1) and percentage uptake difference (Eq. 2) between ¹⁸F-FDG scans were calculated. For test–retest at different institutions, difference scores are positive if the community-based network scanner SUV is higher. Difference in log(SUV_max) (Eq. 3) was used to calculate the repeatability coefficient (RC) as previously reported, for the most ¹⁸F-FDG–avid lesion at the first scan and for the average SUV_max of up to 7 lesions (13,14), using the 7 most ¹⁸F-FDG–avid lesions. The RC was also calculated, accommodating multiple lesions per patient in the analysis using a variance estimate (sum of between-subject and within-subject variance (23)) from a linear mixed-effects regression model of d_RC with random intercept.Eq. 1Eq. 2Eq. 3

Three groups were of interest: group A was patients studied on the same scanner, group B was patients studied on different scanners of the same model within the same institution, and group C was patients studied on different scanner models at different institutions. Because only 2 patients were in group B, we anticipated combining groups for statistical comparisons.

This study addresses reproducibility across different scanners, where an average bias of zero is not assumed. Therefore, the primary analysis emphasizes Bland–Altman limits of agreement (centered around average difference) rather than the RC (centered around zero). For testing group differences, |D| was selected as the primary endpoint to facilitate interpretation as absolute percentage difference (11). Linear mixed-effects regression models were fitted to measure associations between test–retest difference (|D|) and scanning group, patient-level, and lesion-level characteristics. A common offset (random intercept) accommodated multiple lesions per patient, and deletion diagnostics checked that primary results were not unduly influenced by data from any individual patient. The dependent variable was log-transformed to satisfy linearity assumptions (Eq. 4).Eq. 4

Log-transformed absolute percentage difference was also used to evaluate the concordance correlation coefficient, a measure of agreement encompassing both bias and variability (24). When directionality as well as magnitude was part of the relationship between outcome and predictor (as for differences in uptake time), difference in log(SUV_max) (Eq. 3) was the dependent variable. Statistical analyses used SAS/STAT software, version 9.4 (SAS Institute, Inc.) (25).

RESULTS

Twenty-three patients were included (20 female, 3 male) (Table 2), of 26 patients enrolled from 2012 to 2015. Two excluded patients had no ¹⁸F-FDG PET/CT–evaluable lesions (1 patient with no lesions, 1 with diffuse uptake only); the third withdrew before the repeat scan because of distress over the clinical scan results. Most patients had breast cancer, but patients with other cancers were also enrolled. The median time between scans was 9 d (range, 1–15 d). Ten patients were studied in the same scanner or site, whereas 13 were studied in different scanners or sites (2 within the same institution, 11 at different institutions with different scanner manufacturers and technologists). One site did not maintain scanner qualification and was disqualified from enrolling additional patients. The same institution injected 3 patients with greater than the protocol-specified maximum of 407 MBq. Another site allowed study entry for a patient with 182 mg/dL blood glucose (175 mg/dL protocol-specified maximum).

View this table:

TABLE 2

Patient Characteristics

Scan and lesion characteristics are summarized in Supplemental Table 2 for 162 lesions (82 bone, 80 soft tissue). The average injected dose was approximately 370 MBq (10 mCi). Uptake time ranged from 54 to 70 min and did not differ by more than 11 min between scans. Mean glucose level was 93 mg/dL for the first scan and 94 mg/dL for the second scan (overall range, 78–182 mg/dL). The average weight was 76.2 kg (range, 49–133.2 kg). Two of the 3 greatest differences were from the 2 patients recruited from 1 community site; the weights at that site were lower (by 4.8 and 2.6 kg) than measurements from the academic site.

Lesion SUV_max Difference

Median percentage uptake difference in SUV_max (Eq. 2) was 11.9% (range, 0.1%–97.0%), and median difference in SUV_max units was 0.6 (range, 0.0–19.1). Figure 1 shows Bland–Altman plots for SUV_max for the 2 scans, separately for 10 patients with repeat scans using the same scanner (panel A), 2 patients imaged with different scanners of the same model (panel B), and 11 patients imaged on 2 different scanners at 2 different sites (academic and network site, panel C). SUV_max for the 162 lesions ranged from 1.0 to 28.8 (average for the repeated scans). Test–retest agreement appears to be better for the same scanner model (panels A and B) than for different models at different sites (panel C).

FIGURE 1.

Bland–Altman plots of difference in SUV_max vs. average SUV_max: 10 patients (51 lesions) with repeat scans using same scanner (A); 2 patients (34 lesions) using different scanners from same academic institution (B); and 11 patients (77 lesions) using different scanners from different sites (C). Within each panel, plotting character/color is same for multiple lesions in single patient. Dashed lines = average difference and 95% limits of agreement. The 2 lesions from melanoma patient (SUV_max, 38.3 and 25.0 on first scan and 19.2 and 16.4 on second scan) are not shown in C but contribute to limits-of-agreement calculations.

Although the median SUV_max was almost 1 unit lower for the same scanner condition (A) than for different scanners (B and C) (Supplemental Table 2), lesions with low ¹⁸F-FDG avidity (<3), medium avidity (3–7), and high avidity (>7) were present in both conditions. However, the 2 patients in panel B had no lesions with an SUV_max of less than 4.4. Most (73%) of the absolute differences were less than 1 SUV_max unit. Fourteen of 23 patients (61%) did not have any lesions with an SUV_max difference of 1 unit or more.

Figure 2 shows image examples: 1 patient with 9 bone lesions studied twice in the same scanner model, and another with 17 mixed bone and soft-tissue lesions studied in a Discovery STE and a Biograph 20 mCT.

FIGURE 2.

(A) Coronal images from 60-y-old woman with stage IV ductal breast carcinoma (blue circles in Fig. 1A, same scanner). SUV_max for 9 evaluable lesions ranged from 3.4 to 5.1 (average, 4.0) for first scan and from 3.1 to 4.9 (average, 4.2) for second scan. Percentage difference was −16% to +16% (average, 3.9%); SUV unit difference was −0.62 to +0.64 (average, 0.15). (B) A 73-y-old woman with stage IV mixed ductal/lobular breast carcinoma (yellow circles in Fig. 1C, different institutions). SUV_max for 17 evaluable lesions was 2.0–12.2 (average, 4.8) for first scan and 1.9–12.0 (average, 4.4) for second scan. Percentage difference was −24% to +25% (average, −7.1%); SUV unit difference was −1.4 to +1.0 (average, −0.39). Normal liver SUV_mean was 2.5 (A) and 2.6 (B) in both scans.

Predictors of Test–Retest Differences in Lesion SUV_max

Mixed-effects models are summarized in Table 3. Model 1 shows a fitted linear mixed-effects model for the 3 scanning scenarios (Fig. 1), suggesting that the 2 patients scanned on different scanners of the same model can be combined with the same-model patients for further analysis. This analysis (model 2) finds an average difference in SUV_max of 8% (95% confidence interval, 6%–10%) for test–retest studies on the same scanner model at the same institution, and an average difference of 18% (13%–24%) when the test–retest scans are performed at different qualified sites and on different scanner models. The overall concordance correlation coefficient was 0.91 (95% bootstrap confidence interval, 0.85–0.94), 0.97 for the same site (0.95–0.98), and 0.84 for different sites (0.74–0.90).

View this table:

TABLE 3

Linear Mixed-Effects Models (Linear Regression for Liver)

The model 2 estimates shown in Table 3 were robust to sensitivity analysis, such as removing the melanoma patient’s 2 tumors that had an extremely high SUV_max. They were also similar for SUL_peak (Supplemental Fig. 1).

Exploratory subgroup analyses examining patient and scanner factors are summarized in Supplemental Table 3. Controlling for scanning site, bone lesions had test–retest reproducibility at least as good as for soft-tissue lesions. Other patient and scanner factors did not appear to affect the magnitude of test–retest differences.

Figure 3 shows Bland–Altman plots for (signed) percentage difference in SUV_max. The magnitude of test–retest variability and differences between same-model and different-model conditions are similar to the results shown in Table 3 and Figure 1: percentage SUV_max differences were generally lower for lesions in patient studies in the same scanner than for those on 2 different scanner models. Estimated 95% RCs and coefficient of variation (from log-transformed SUV_max, as previously published (13,14)) are summarized in Supplemental Table 4 and shown graphically in Supplemental Figure 2, along with the Quantitative Imaging Biomarkers Alliance (QIBA) profile SUV_max 95% limits of same-scanner repeatability (14,20).

FIGURE 3.

Percentage difference in SUV_max vs. average SUV_max: 12 patients (85 lesions) with repeat scans using same scanner or different scanners from same unit (combined data from Figs. 1A and 1B) (A); 11 patients (77 lesions) using different scanners from different sites (B). Plotting character/color identifies multiple lesions in single patient, as for Figure 1. Dashed lines = average percentage difference and 95% limits of agreement.

Liver SUV_mean

Mean liver uptake (Fig. 4) was consistent, with little between-patient variation around the average SUV_mean of 2.4, and differences within 0.5 units for repeat within-patient scans. Linear regression (with log-transformed absolute value of percentage difference, as above for the lesion-level analysis) found the average percentage difference to be similar, 5%–6% for both the same scanner or site and different sites (Table 3). A linear mixed-effects model controlling for site did not support an association between magnitude of percentage difference in liver SUV_mean and lesion SUV_max (P = 0.12, with higher liver test–retest differences predicting slightly lower tumor test–retest).

FIGURE 4.

Bland–Altman plot of liver SUV_mean (n = 23). Light-green circles = same scanner; dark-green circles = different scanners from same site; gray circles = different scanner models from different sites; dashed lines = average difference and 95% limits of agreement.

DISCUSSION

After qualification including calibration with a common reference object, SUV_max was highly reproducible for 10 breast cancer patients with test–retest studies on the same scanner and for 2 breast cancer patients scanned on different scanners of the same model (with shared service personnel and imaging protocols). The estimated within-subject coefficient of variation of 9% (Supplemental Table 4) was lower than the average of 11% for other same-scanner test–retest studies in oncology patients (Table 3 in Lodge (11)). In contrast, 11 patients with repeat scans on different scanner models showed a within-subject coefficient of variation of 22%, with observation of both bias (each lesion with higher SUV_max on one scan than the other) and variability (different lesions with higher and lower SUV_max between scans for the same patient) (Fig. 1). The 95% RC of (−21%, 26%) for same-model test–retest is within the (−28%, 39%) QIBA ¹⁸F-FDG PET/CT profile limits for single-center studies using the same scanner (20), whereas the 95% RC of (−42%, 73%) for different models does not appear to meet the QIBA profile standards (Supplemental Fig. 2). No patient, lesion, or scanning protocol features clearly predicted test–retest variability, in part because of rigorous control of factors such as uptake time.

The SUV in a normal region of liver is a standard method to assess the validity of tumor ¹⁸F-FDG uptake estimates (1). Our average liver SUV_mean of 2.4 with an average absolute difference of 0.19 (SD, 0.16) was similar to the results of previous studies (26,27). We did not adjust SUV_max for uptake time or for normal liver or blood uptake but would expect results similar to those of a recent study (28), in which lack of variability in uptake times and blood uptake diminished the impact of adjustment algorithms in improving test–retest agreement of tumor uptake measurements.

Uptake in large, uniform regions such as the liver is not affected by resolution effects such as partial-volume errors. Scanner calibration would be expected to minimize test–retest variance even between different makes and models of scanners, as we observed: test–retest variability in normal liver uptake appeared similar in the same and different scanner models (Fig. 4), unlike the greater variability in lesion uptake measured in different scanners (Fig. 1). Most lesions do not have uniform ¹⁸F-FDG uptake over a large area, so they are known to have size-dependent resolution effects (29). Variation in size-dependent bias for different types of scanners motivates the ongoing work in harmonization of reconstruction algorithms and other scanner features in multicenter trials (30,31).

Although the true activity is known for the National Institute of Standards and Technology–traceable sources, measured PET image activity for the epoxy calibration phantom may be biased by manufacturer-dependent CT-based attenuation and scatter correction effects. The calibration phantoms could therefore not evaluate absolute scanner calibration; however, their spatial uniformity and temporal stability still permitted precise monitoring of scanner calibration consistency (21). By monitoring every 3 mo, we could evaluate the effects of periodic scanner recalibration and identify any long-term drifts in scanner bias. Low variability in test–retest liver uptake measures, regardless of manufacturer, supports the efficacy of our scanner calibration.

A limitation of this study is that it had a relatively small sample size and that the same site/institution group included only breast cancer patients. In addition, because no patients had both ¹⁸F-FDG PET/CT scans outside the academic institution we could not assess same-scanner test–retest agreement at network sites. An exploratory subgroup analysis did not identify lesion or scanning protocol factors with strong effects on test–retest SUV_max agreement (Supplemental Table 3). However, these analyses were not powered to assess lesion location (e.g., propensity for motion artifacts or subcutaneous nodules with compromised attenuation) or lesion type (e.g., high-uptake, inflammatory melanoma lesions). Some scanner characteristics, such as a voxel size greater than 4 mm, did not fall within the eventual UPICT standard (Supplemental Table 1). Finally, we did not control spatial resolution between scanners, nor did we attempt to quantify the effect of variable noise or image reconstruction parameters on SUV_max. A higher average SUV_max for community scanners that mostly used ultra-high-definition reconstruction (Fig. 1C; Table 1) is consistent with studies with multiple reconstructions of the same images (32).

CONCLUSION

This study shows that ¹⁸F-FDG PET/CT scanner calibration and qualification, with consistent imaging protocols, can yield highly reproducible SUV measurements; test–retest error for the same scanner or same scanner model (within the same institution) is similar to or lower than estimates in prior test–retest studies. If our findings for different scanners of the same model are confirmed, clinical trials that apply these qualification, calibration, and quality control criteria could increase patient recruitment by allowing serial measurements from similar scanner models at different sites. Additionally, reducing test–retest variation reduces the required number of patients for a given study power (15,18). Before considering use of different scanner models for serial measurements, though, future studies should incorporate modern guidelines such as the UPICT protocol (6) or the QIBA profile (20) and explore harmonization techniques proposed to overcome inherent differences in acquisition and reconstruction methods (31).

DISCLOSURE

This work was supported by NIH grants U01CA148131, U01CA190254, R50CA211270, P30CA015704, P30CA047904 (Biostatistics), R01CA169072, and NCI-SAIC-24XS036-004. No other potential conflict of interest relevant to this article was reported.

Acknowledgments

We thank the Seattle Cancer Care Alliance Network, as well as the physicians, technologists, and physicists from the University of Washington Medical Center, the Seattle Cancer Care Alliance, Harborview Medical Center, Tacoma General, and Skagit Valley Medical Center who helped make this study possible. We also thank Nuclear Medicine Technologists Lisa Dunnwald, Amy Quinn, and Patrick Clark for network site visits; Rebecca Christopfel for administrative assistance; and the patient volunteers. We also acknowledge helpful discussions with QIBA and National Cancer Institute Quantitative Imaging Network (QIN) members.

Footnotes

Published online Oct. 25, 2018.

REFERENCES

1.↵
1. Wahl RL,
2. Jacene H,
3. Kasamon Y,
4. Lodge MA
. From RECIST to PERCIST: evolving considerations for PET response criteria in solid tumors. J Nucl Med. 2009;50(suppl 1):122S–150S.
OpenUrl Abstract/FREE Full Text
2.↵
1. Weber WA,
2. Ziegler SI,
3. Thodtmann R,
4. Hanauske AR,
5. Schwaiger M
. Reproducibility of metabolic measurements in malignant tumors using FDG PET. J Nucl Med. 1999;40:1771–1777.
OpenUrl Abstract/FREE Full Text
3.↵
1. Nahmias C,
2. Wahl LM
. Reproducibility of standardized uptake value measurements determined by ¹⁸F-FDG PET in malignant tumors. J Nucl Med. 2008;49:1804–1808.
OpenUrl Abstract/FREE Full Text
4.↵
1. Scheuermann JS,
2. Saffer JR,
3. Karp JS,
4. Levering AM,
5. Siegel BA
. Qualification of PET scanners for use in multicenter cancer clinical trials: the American College of Radiology Imaging Network experience. J Nucl Med. 2009;50:1187–1193.
OpenUrl Abstract/FREE Full Text
5.↵
1. Boellaard R,
2. Delgado-Bolton R,
3. Oyen WJ,
4. et al
. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–354.
OpenUrl CrossRef PubMed
6.↵
1. Graham MM,
2. Wahl RL,
3. Hoffman JM,
4. et al
. Summary of the UPICT protocol for ¹⁸F-FDG PET/CT imaging in oncology clinical trials. J Nucl Med. 2015;56:955–961.
OpenUrl Abstract/FREE Full Text
7.↵
1. Boellaard R
. Standards for PET image acquisition and quantitative data analysis. J Nucl Med. 2009;50(suppl 1):11S–20S.
OpenUrl Abstract/FREE Full Text
8.
1. Doot RK,
2. Scheuermann JS,
3. Christian PE,
4. Karp JS,
5. Kinahan PE
. Instrumentation factors affecting variance and bias of quantifying tracer uptake with PET/CT. Med Phys. 2010;37:6035–6046.
OpenUrl CrossRef PubMed
9.↵
1. Fahey FH,
2. Kinahan PE,
3. Doot RK,
4. Kocak M,
5. Thurston H,
6. Poussaint TY
. Variability in PET quantitation within a multicenter consortium. Med Phys. 2010;37:3660–3666.
OpenUrl CrossRef PubMed
10.↵
1. de Langen AJ,
2. Vincent A,
3. Velasquez LM,
4. et al
. Repeatability of ¹⁸F-FDG uptake measurements in tumors: a metaanalysis. J Nucl Med. 2012;53:701–708.
OpenUrl Abstract/FREE Full Text
11.↵
1. Lodge MA
. Repeatability of SUV in oncologic ¹⁸F-FDG PET. J Nucl Med. 2017;58:523–532.
OpenUrl Abstract/FREE Full Text
12.↵
1. Kumar V,
2. Nath K,
3. Berman CG,
4. et al
. Variance of SUVs for FDG-PET/CT is greater in clinical practice than under ideal study settings. Clin Nucl Med. 2013;38:175–182.
OpenUrl CrossRef PubMed
13.↵
1. Velasquez LM,
2. Boellaard R,
3. Kollia G,
4. et al
. Repeatability of ¹⁸F-FDG PET in a multicenter phase I study of patients with advanced gastrointestinal malignancies. J Nucl Med. 2009;50:1646–1654.
OpenUrl Abstract/FREE Full Text
14.↵
1. Weber WA,
2. Gatsonis CA,
3. Mozley PD,
4. et al
. Repeatability of ¹⁸F-FDG PET/CT in advanced non-small cell lung cancer: prospective assessment in 2 multicenter trials. J Nucl Med. 2015;56:1137–1143.
OpenUrl Abstract/FREE Full Text
15.↵
1. Kurland BF,
2. Doot RK,
3. Linden HM,
4. Mankoff DA,
5. Kinahan PE
. Multicenter trials using ¹⁸F-fluorodeoxyglucose (FDG) PET to predict chemotherapy response: effects of differential measurement error and bias on power calculations for unselected and enrichment designs. Clin Trials. 2013;10:886–895.
OpenUrl CrossRef PubMed
16.
1. Doot RK,
2. Pierce LA II.,
3. Byrd D,
4. Elston B,
5. Allberg KC,
6. Kinahan PE
. Biases in multicenter longitudinal PET standardized uptake value measurements. Transl Oncol. 2014;7:48–54.
OpenUrl CrossRef PubMed
17.
1. Kurland BF,
2. Muzi M,
3. Peterson LM,
4. et al
. Multicenter clinical trials using ¹⁸F-FDG PET to measure early response to oncologic therapy: effects of injection-to-acquisition time variability on required sample size. J Nucl Med. 2016;57:226–230.
OpenUrl Abstract/FREE Full Text
18.↵
1. Doot RK,
2. Kurland BF,
3. Kinahan PE,
4. Mankoff DA
. Design considerations for using PET as a response measure in single site and multicenter clinical trials. Acad Radiol. 2012;19:184–190.
OpenUrl CrossRef PubMed
19.↵
1. Kamibayashi T,
2. Tsuchida T,
3. Demura Y,
4. et al
. Reproducibility of semi-quantitative parameters in FDG-PET using two different PET scanners: influence of attenuation correction method and examination interval. Mol Imaging Biol. 2008;10:162–166.
OpenUrl CrossRef PubMed
20.↵
QIBA Profile: FDG-PET/CT as an imaging biomarker measuring response to cancer therapy—version 1.13. QIBA website. http://qibawiki.rsna.org/images/1/1f/QIBA_FDG-PET_Profile_v113.pdf. Published November 18, 2016. Accessed December 27, 2018.
21.↵
1. Byrd DW,
2. Doot RK,
3. Allberg KC,
4. et al
. Evaluation of cross-calibrated ⁶⁸Ge/⁶⁸Ga phantoms for assessing PET/CT measurement bias in oncology imaging for single- and multicenter trials. Tomography. 2016;2:353–360.
OpenUrl
22.↵
1. Zimmerman BE,
2. Cessna JT
. Development of a traceable calibration methodology for solid ⁶⁸Ge/⁶⁸Ga sources used as a calibration surrogate for ¹⁸F in radionuclide activity calibrators. J Nucl Med. 2010;51:448–453.
OpenUrl Abstract/FREE Full Text
23.↵
1. Bland JM,
2. Altman DG
. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–160.
OpenUrl CrossRef PubMed
24.↵
1. Lin LI
. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45:255–268.
OpenUrl CrossRef PubMed
25.↵
1. Crawford SB,
2. Kosinski AS,
3. Lin HM,
4. Williamson JM,
5. Barnhart HX
. Computer programs for the concordance correlation coefficient. Comput Methods Programs Biomed. 2007;88:62–74.
OpenUrl CrossRef PubMed
26.↵
1. Boktor RR,
2. Walker G,
3. Stacey R,
4. Gledhill S,
5. Pitman AG
. Reference range for intrapatient variability in blood-pool and liver SUV for ¹⁸F-FDG PET. J Nucl Med. 2013;54:677–682.
OpenUrl Abstract/FREE Full Text
27.↵
1. Viner M,
2. Mercier G,
3. Hao F,
4. Malladi A,
5. Subramaniam RM
. Liver SULmean at FDG PET/CT: interreader agreement and impact of placement of volume of interest. Radiology. 2013;267:596–601.
OpenUrl CrossRef PubMed
28.↵
1. Hofheinz F,
2. Apostolova I,
3. Oehme L,
4. Kotzerke J,
5. van den Hoff J
. Test-retest variability in lesion SUV and lesion SUR in ¹⁸F-FDG PET: an analysis of data from two prospective multicenter trials. J Nucl Med. 2017;58:1770–1775.
OpenUrl Abstract/FREE Full Text
29.↵
1. Weber WA
. Use of PET for monitoring cancer therapy and for predicting outcome. J Nucl Med. 2005;46:983–995.
OpenUrl Abstract/FREE Full Text
30.↵
1. Lasnon C,
2. Salomon T,
3. Desmonts C,
4. et al
. Generating harmonized SUV within the EANM EARL accreditation program: software approach versus EARL-compliant reconstruction. Ann Nucl Med. 2017;31:125–134.
OpenUrl
31.↵
1. Aide N,
2. Lasnon C,
3. Veit-Haibach P,
4. Sera T,
5. Sattler B,
6. Boellaard R
. EANM/EARL harmonization strategies in PET quantification: from daily practice to multicentre oncological studies. Eur J Nucl Med Mol Imaging. 2017;44:17–31.
OpenUrl
32.↵
1. Kuhnert G,
2. Boellaard R,
3. Sterzer S,
4. et al
. Impact of PET/CT image reconstruction methods and liver uptake normalization strategies on quantitative image analysis. Eur J Nucl Med Mol Imaging. 2016;43:249–258.
OpenUrl CrossRef PubMed

Received for publication February 13, 2018.
Accepted for publication October 1, 2018.

In this issue

Download PDF

Article Alerts

Email Article

Citation Tools

Bookmark this article

Keywords

[1] 1.↵
Wahl RL,
Jacene H,
Kasamon Y,
Lodge MA
. From RECIST to PERCIST: evolving considerations for PET response criteria in solid tumors. J Nucl Med. 2009;50(suppl 1):122S–150S.
OpenUrl Abstract/FREE Full Text

[2] Wahl RL,

[3] Jacene H,

[4] Kasamon Y,

[5] Lodge MA

[6] 2.↵
Weber WA,
Ziegler SI,
Thodtmann R,
Hanauske AR,
Schwaiger M
. Reproducibility of metabolic measurements in malignant tumors using FDG PET. J Nucl Med. 1999;40:1771–1777.
OpenUrl Abstract/FREE Full Text

[7] Weber WA,

[8] Ziegler SI,

[9] Thodtmann R,

[10] Hanauske AR,

[11] Schwaiger M

[12] 3.↵
Nahmias C,
Wahl LM
. Reproducibility of standardized uptake value measurements determined by ¹⁸F-FDG PET in malignant tumors. J Nucl Med. 2008;49:1804–1808.
OpenUrl Abstract/FREE Full Text

[13] Nahmias C,

[14] Wahl LM

[15] 4.↵
Scheuermann JS,
Saffer JR,
Karp JS,
Levering AM,
Siegel BA
. Qualification of PET scanners for use in multicenter cancer clinical trials: the American College of Radiology Imaging Network experience. J Nucl Med. 2009;50:1187–1193.
OpenUrl Abstract/FREE Full Text

[16] Scheuermann JS,

[17] Saffer JR,

[18] Karp JS,

[19] Levering AM,

[20] Siegel BA

[21] 5.↵
Boellaard R,
Delgado-Bolton R,
Oyen WJ,
et al
. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–354.
OpenUrl CrossRef PubMed

[22] Boellaard R,

[23] Delgado-Bolton R,

[24] Oyen WJ,

[25] et al

[26] 6.↵
Graham MM,
Wahl RL,
Hoffman JM,
et al
. Summary of the UPICT protocol for ¹⁸F-FDG PET/CT imaging in oncology clinical trials. J Nucl Med. 2015;56:955–961.
OpenUrl Abstract/FREE Full Text

[27] Graham MM,

[28] Wahl RL,

[29] Hoffman JM,

[30] et al

[31] 7.↵
Boellaard R
. Standards for PET image acquisition and quantitative data analysis. J Nucl Med. 2009;50(suppl 1):11S–20S.
OpenUrl Abstract/FREE Full Text

[32] Boellaard R

[33] 8.
Doot RK,
Scheuermann JS,
Christian PE,
Karp JS,
Kinahan PE
. Instrumentation factors affecting variance and bias of quantifying tracer uptake with PET/CT. Med Phys. 2010;37:6035–6046.
OpenUrl CrossRef PubMed

[34] Doot RK,

[35] Scheuermann JS,

[36] Christian PE,

[37] Karp JS,

[38] Kinahan PE

[39] 9.↵
Fahey FH,
Kinahan PE,
Doot RK,
Kocak M,
Thurston H,
Poussaint TY
. Variability in PET quantitation within a multicenter consortium. Med Phys. 2010;37:3660–3666.
OpenUrl CrossRef PubMed

[40] Fahey FH,

[41] Kinahan PE,

[42] Doot RK,

[43] Kocak M,

[44] Thurston H,

[45] Poussaint TY

[46] 10.↵
de Langen AJ,
Vincent A,
Velasquez LM,
et al
. Repeatability of ¹⁸F-FDG uptake measurements in tumors: a metaanalysis. J Nucl Med. 2012;53:701–708.
OpenUrl Abstract/FREE Full Text

[47] de Langen AJ,

[48] Vincent A,

[49] Velasquez LM,

[50] et al

[51] 11.↵
Lodge MA
. Repeatability of SUV in oncologic ¹⁸F-FDG PET. J Nucl Med. 2017;58:523–532.
OpenUrl Abstract/FREE Full Text

[52] Lodge MA

[53] 12.↵
Kumar V,
Nath K,
Berman CG,
et al
. Variance of SUVs for FDG-PET/CT is greater in clinical practice than under ideal study settings. Clin Nucl Med. 2013;38:175–182.
OpenUrl CrossRef PubMed

[54] Kumar V,

[55] Nath K,

[56] Berman CG,

[57] et al

[58] 13.↵
Velasquez LM,
Boellaard R,
Kollia G,
et al
. Repeatability of ¹⁸F-FDG PET in a multicenter phase I study of patients with advanced gastrointestinal malignancies. J Nucl Med. 2009;50:1646–1654.
OpenUrl Abstract/FREE Full Text

[59] Velasquez LM,

[60] Boellaard R,

[61] Kollia G,

[62] et al

[63] 14.↵
Weber WA,
Gatsonis CA,
Mozley PD,
et al
. Repeatability of ¹⁸F-FDG PET/CT in advanced non-small cell lung cancer: prospective assessment in 2 multicenter trials. J Nucl Med. 2015;56:1137–1143.
OpenUrl Abstract/FREE Full Text

[64] Weber WA,

[65] Gatsonis CA,

[66] Mozley PD,

[67] et al

[68] 15.↵
Kurland BF,
Doot RK,
Linden HM,
Mankoff DA,
Kinahan PE
. Multicenter trials using ¹⁸F-fluorodeoxyglucose (FDG) PET to predict chemotherapy response: effects of differential measurement error and bias on power calculations for unselected and enrichment designs. Clin Trials. 2013;10:886–895.
OpenUrl CrossRef PubMed

[69] Kurland BF,

[70] Doot RK,

[71] Linden HM,

[72] Mankoff DA,

[73] Kinahan PE

[74] 16.
Doot RK,
Pierce LA II.,
Byrd D,
Elston B,
Allberg KC,
Kinahan PE
. Biases in multicenter longitudinal PET standardized uptake value measurements. Transl Oncol. 2014;7:48–54.
OpenUrl CrossRef PubMed

[75] Doot RK,

[76] Pierce LA II.,

[77] Byrd D,

[78] Elston B,

[79] Allberg KC,

[80] Kinahan PE

[81] 17.
Kurland BF,
Muzi M,
Peterson LM,
et al
. Multicenter clinical trials using ¹⁸F-FDG PET to measure early response to oncologic therapy: effects of injection-to-acquisition time variability on required sample size. J Nucl Med. 2016;57:226–230.
OpenUrl Abstract/FREE Full Text

[82] Kurland BF,

[83] Muzi M,

[84] Peterson LM,

[85] et al

[86] 18.↵
Doot RK,
Kurland BF,
Kinahan PE,
Mankoff DA
. Design considerations for using PET as a response measure in single site and multicenter clinical trials. Acad Radiol. 2012;19:184–190.
OpenUrl CrossRef PubMed

[87] Doot RK,

[88] Kurland BF,

[89] Kinahan PE,

[90] Mankoff DA

[91] 19.↵
Kamibayashi T,
Tsuchida T,
Demura Y,
et al
. Reproducibility of semi-quantitative parameters in FDG-PET using two different PET scanners: influence of attenuation correction method and examination interval. Mol Imaging Biol. 2008;10:162–166.
OpenUrl CrossRef PubMed

[92] Kamibayashi T,

[93] Tsuchida T,

[94] Demura Y,

[95] et al

[96] 20.↵
QIBA Profile: FDG-PET/CT as an imaging biomarker measuring response to cancer therapy—version 1.13. QIBA website. http://qibawiki.rsna.org/images/1/1f/QIBA_FDG-PET_Profile_v113.pdf. Published November 18, 2016. Accessed December 27, 2018.

[97] 21.↵
Byrd DW,
Doot RK,
Allberg KC,
et al
. Evaluation of cross-calibrated ⁶⁸Ge/⁶⁸Ga phantoms for assessing PET/CT measurement bias in oncology imaging for single- and multicenter trials. Tomography. 2016;2:353–360.
OpenUrl

[98] Byrd DW,

[99] Doot RK,

[100] Allberg KC,

[101] et al

[102] 22.↵
Zimmerman BE,
Cessna JT
. Development of a traceable calibration methodology for solid ⁶⁸Ge/⁶⁸Ga sources used as a calibration surrogate for ¹⁸F in radionuclide activity calibrators. J Nucl Med. 2010;51:448–453.
OpenUrl Abstract/FREE Full Text

[103] Zimmerman BE,

[104] Cessna JT

[105] 23.↵
Bland JM,
Altman DG
. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–160.
OpenUrl CrossRef PubMed

[106] Bland JM,

[107] Altman DG

[108] 24.↵
Lin LI
. A concordance correlation coefficient to evaluate reproducibility. Biometrics. 1989;45:255–268.
OpenUrl CrossRef PubMed

[109] Lin LI

[110] 25.↵
Crawford SB,
Kosinski AS,
Lin HM,
Williamson JM,
Barnhart HX
. Computer programs for the concordance correlation coefficient. Comput Methods Programs Biomed. 2007;88:62–74.
OpenUrl CrossRef PubMed

[111] Crawford SB,

[112] Kosinski AS,

[113] Lin HM,

[114] Williamson JM,

[115] Barnhart HX

[116] 26.↵
Boktor RR,
Walker G,
Stacey R,
Gledhill S,
Pitman AG
. Reference range for intrapatient variability in blood-pool and liver SUV for ¹⁸F-FDG PET. J Nucl Med. 2013;54:677–682.
OpenUrl Abstract/FREE Full Text

[117] Boktor RR,

[118] Walker G,

[119] Stacey R,

[120] Gledhill S,

[121] Pitman AG

[122] 27.↵
Viner M,
Mercier G,
Hao F,
Malladi A,
Subramaniam RM
. Liver SULmean at FDG PET/CT: interreader agreement and impact of placement of volume of interest. Radiology. 2013;267:596–601.
OpenUrl CrossRef PubMed

[123] Viner M,

[124] Mercier G,

[125] Hao F,

[126] Malladi A,

[127] Subramaniam RM

[128] 28.↵
Hofheinz F,
Apostolova I,
Oehme L,
Kotzerke J,
van den Hoff J
. Test-retest variability in lesion SUV and lesion SUR in ¹⁸F-FDG PET: an analysis of data from two prospective multicenter trials. J Nucl Med. 2017;58:1770–1775.
OpenUrl Abstract/FREE Full Text

[129] Hofheinz F,

[130] Apostolova I,

[131] Oehme L,

[132] Kotzerke J,

[133] van den Hoff J

[134] 29.↵
Weber WA
. Use of PET for monitoring cancer therapy and for predicting outcome. J Nucl Med. 2005;46:983–995.
OpenUrl Abstract/FREE Full Text

[135] Weber WA

[136] 30.↵
Lasnon C,
Salomon T,
Desmonts C,
et al
. Generating harmonized SUV within the EANM EARL accreditation program: software approach versus EARL-compliant reconstruction. Ann Nucl Med. 2017;31:125–134.
OpenUrl

[137] Lasnon C,

[138] Salomon T,

[139] Desmonts C,

[140] et al

[141] 31.↵
Aide N,
Lasnon C,
Veit-Haibach P,
Sera T,
Sattler B,
Boellaard R
. EANM/EARL harmonization strategies in PET quantification: from daily practice to multicentre oncological studies. Eur J Nucl Med Mol Imaging. 2017;44:17–31.
OpenUrl

[142] Aide N,

[143] Lasnon C,

[144] Veit-Haibach P,

[145] Sera T,

[146] Sattler B,

[147] Boellaard R

[148] 32.↵
Kuhnert G,
Boellaard R,
Sterzer S,
et al
. Impact of PET/CT image reconstruction methods and liver uptake normalization strategies on quantitative image analysis. Eur J Nucl Med Mol Imaging. 2016;43:249–258.
OpenUrl CrossRef PubMed

[149] Kuhnert G,

[150] Boellaard R,

[151] Sterzer S,

[152] et al

Main menu

User menu

Search

Test–Retest Reproducibility of ¹⁸F-FDG PET/CT Uptake in Cancer Patients Within a Qualified and Calibrated Local Network

Abstract

MATERIALS AND METHODS

Multicenter Consortium and PET/CT Scanner Qualification

Patient Eligibility

¹⁸F-FDG Imaging

Image Analysis

Statistical Analysis

RESULTS

Lesion SUV_max Difference

Predictors of Test–Retest Differences in Lesion SUV_max

Liver SUV_mean

DISCUSSION

CONCLUSION

DISCLOSURE

Acknowledgments

Footnotes

REFERENCES

In this issue

Citation Manager Formats

Related Articles

Cited By...

More in this TOC Section

Oncology

Clinical

Similar Articles

Keywords

Main menu

User menu

Search

Test–Retest Reproducibility of 18F-FDG PET/CT Uptake in Cancer Patients Within a Qualified and Calibrated Local Network

Abstract

MATERIALS AND METHODS

Multicenter Consortium and PET/CT Scanner Qualification

Patient Eligibility

18F-FDG Imaging

Image Analysis

Statistical Analysis

RESULTS

Lesion SUVmax Difference

Predictors of Test–Retest Differences in Lesion SUVmax

Liver SUVmean

DISCUSSION

CONCLUSION

DISCLOSURE

Acknowledgments

Footnotes

REFERENCES

In this issue

Citation Manager Formats

Jump to section

Related Articles

Cited By...

More in this TOC Section

Oncology

Clinical

Similar Articles

Keywords

Test–Retest Reproducibility of ¹⁸F-FDG PET/CT Uptake in Cancer Patients Within a Qualified and Calibrated Local Network

¹⁸F-FDG Imaging

Lesion SUV_max Difference

Predictors of Test–Retest Differences in Lesion SUV_max

Liver SUV_mean