Abstract
1224
Objectives Quantitative measures of [18F]-FDG uptake from PET scans show promise as early response indicators and predictive biomarkers. However, in multicenter trials substantial measurement error and bias may arise from differences in scanner calibration, patient factors, scanning, and image analysis procedures. We simulated power calculations to inform study design, balancing faster accrual with increased measurement error.
Methods Reference standardized uptake value (SUV) data were selected with replacement from a single-institution trial using FDG PET to measure breast cancer response to neoadjuvant chemotherapy; response data were simulated by logistic regression predicting response by mid-therapy percent change in SUV. The impact of increased error for multicenter trials was simulated using increased measurement error and bias. We examined different scenarios for bias and error added to SUVs: 20%-40% measurement error, 0%-40% bias, and fixed bias/error values. The proportion of patients recruited from sites with higher vs. lower additional bias/error varied from 25% to 75%.
Results Reference power (from source data with no added error) was 0.92 for n=100 to detect an association between percentage change in SUV and response. With moderate (20%) additional measurement error for 3/4, 1/2, and 1/4 of measurements and 40% for the remainder, power was 0.70, 0.61, and 0.53 respectively. Reduction of study power was similar for other manifestations of measurement error (bias as a percentage of true value, absolute error, and absolute bias). Enrichment designs, which recruit additional patients by not conducting a second scan in patients with low baseline SUV, did not lead to greater power.
Conclusions Rigorous characterization and quantification of bias and error, and harmonization of standards, are needed for accurate design of effective multicenter clinical trials incorporating PET.
Research Support NIH grant U01CA148131, NCI Contract 24XS036-00