RT Journal Article SR Electronic T1 Estimating model observer performance with small image ensembles JF Journal of Nuclear Medicine JO J Nucl Med FD Society of Nuclear Medicine SP 540 OP 540 VO 56 IS supplement 3 A1 Fatma E. Elshahaby A1 Michael Ghaly A1 Xin Li A1 Abhinav Jha A1 Eric Frey YR 2015 UL http://jnm.snmjournals.org/content/56/supplement_3/540.abstract AB 540 Objectives Task based assessment of image quality requires large ensembles of images with known truth status, which can be hard and computationally expensive to obtain. Our goal was to evaluate various channelized model observers in terms of their ability to estimate performance using small ensembles.Methods We used a population of realistic myocardial perfusion SPECT projections with uptake but without anatomic or signal variability. An anterolateral defect with 25% extent and 10% severity was used. Images were reconstructed using filtered backprojection followed by a 3-D Butterworth filter (order 8 and cutoffs from 0.08 to 0.24/pix). Standard cardiac post-processing methods were used to generate short axis images. A set of 6 octave-wide rotationally symmetric difference-of-mesa channels was applied in the slice containing and at the position of the defect centroid. The resulting feature vectors were analyzed using the channelized Hotelling observer (CHO), linear discriminant (CLD), and quadratic discriminant (CQD). Each observer was trained and tested using a leave-one-out method. AUC values were estimated using the ROCkit software. The AUC value for an ensemble size of 4,000 served as a gold standard. All ensembles had equal numbers of defect present and absent images. Observers were compared in terms of bias and MSE of AUC values for an ensemble of 40 images.Results The performance of the observers was similar at 4,000 images, indicating this is a suitable gold standard. Smaller sample sizes resulted in negatively biased AUCs for all observers. For 40 images at cutoff 0.1/pix, the bias and MSE pairs were (-0.15±0.08, 0.04±0.03) and (-0.04±0.05, 0.006±0.01) for the CHO and CLD, respectively. The CQD was intermediate for both measures. The performance rankings for the filter cutoffs were preserved best for small ensembles by the CLD followed by the CQD.Conclusions The CLD provided lower bias and MSE and preserved rankings better for small ensembles than the CHO or CQD, and thus might be preferable for task-based optimization and evaluation studies