PT - JOURNAL ARTICLE AU - Fatma E. Elshahaby AU - Michael Ghaly AU - Xin Li AU - Abhinav Jha AU - Eric Frey TI - Estimating model observer performance with small image ensembles DP - 2015 May 01 TA - Journal of Nuclear Medicine PG - 540--540 VI - 56 IP - supplement 3 4099 - http://jnm.snmjournals.org/content/56/supplement_3/540.short 4100 - http://jnm.snmjournals.org/content/56/supplement_3/540.full SO - J Nucl Med2015 May 01; 56 AB - 540 Objectives Task based assessment of image quality requires large ensembles of images with known truth status, which can be hard and computationally expensive to obtain. Our goal was to evaluate various channelized model observers in terms of their ability to estimate performance using small ensembles.Methods We used a population of realistic myocardial perfusion SPECT projections with uptake but without anatomic or signal variability. An anterolateral defect with 25% extent and 10% severity was used. Images were reconstructed using filtered backprojection followed by a 3-D Butterworth filter (order 8 and cutoffs from 0.08 to 0.24/pix). Standard cardiac post-processing methods were used to generate short axis images. A set of 6 octave-wide rotationally symmetric difference-of-mesa channels was applied in the slice containing and at the position of the defect centroid. The resulting feature vectors were analyzed using the channelized Hotelling observer (CHO), linear discriminant (CLD), and quadratic discriminant (CQD). Each observer was trained and tested using a leave-one-out method. AUC values were estimated using the ROCkit software. The AUC value for an ensemble size of 4,000 served as a gold standard. All ensembles had equal numbers of defect present and absent images. Observers were compared in terms of bias and MSE of AUC values for an ensemble of 40 images.Results The performance of the observers was similar at 4,000 images, indicating this is a suitable gold standard. Smaller sample sizes resulted in negatively biased AUCs for all observers. For 40 images at cutoff 0.1/pix, the bias and MSE pairs were (-0.15±0.08, 0.04±0.03) and (-0.04±0.05, 0.006±0.01) for the CHO and CLD, respectively. The CQD was intermediate for both measures. The performance rankings for the filter cutoffs were preserved best for small ensembles by the CLD followed by the CQD.Conclusions The CLD provided lower bias and MSE and preserved rankings better for small ensembles than the CHO or CQD, and thus might be preferable for task-based optimization and evaluation studies