Abstract
We studied the effects of reduced 18F-FDG injection activity on interpretation of positron emission mammography (PEM) images and compared image interpretation between 2 postinjection imaging times. Methods: We performed a receiver-operating-characteristic (ROC) study using PEM images reconstructed with different count levels expected from injected activities between 23 and 185 MBq. Thirty patients received 2 PEM scans at postinjection times of 60 and 120 min. Half of the patients were scanned with a standard protocol; the others received one-half of the standard activity. Images were reconstructed using 100%, 50%, and 25% of the total counts acquired. Eight radiologists used a 5-point confidence scale to score 232 PEM images for the presence of up to 3 malignant lesions. Paired images were analyzed with conditional logistic regression and ROC analysis to investigate changes in interpretation. Results: There was a trend for increasing lesion detection sensitivity with increased image counts: odds ratios were 2.2 (P = 0.01) and 1.9 (P = 0.04) per doubling of image counts for 60- and 120-min uptake images, respectively, without significant difference between time points (P = 0.7). The area under the ROC curve (AUC) was highest for the 100%-count, 60-min images (0.83 vs. 0.75 for 50%-counts, P = 0.02). The 120-min images had a similar trend but did not reach statistical significance (AUC = 0.79 vs. 0.73, P = 0.1). Our data did not yield significant trends between specificity and image counts. Lesion-to-background ratios increased between 60- and 120-min scans (P < 0.001). Conclusion: Reducing the image counts relative to the standard protocol decreased diagnostic accuracy. The increase in lesion-to-background ratio between 60- and 120-min uptake times was not enough to improve detection sensitivity in this study, perhaps in part due to fewer counts in the later scan.
Motivated by remaining challenges in diagnosis, staging, and management of breast cancer, small, high-resolution PET scanners dedicated to breast imaging have been investigated since the 1990s (1–13). Dedicated breast PET systems (positron emission mammography [PEM]) have spatial resolutions better than whole-body (WB) PET scanners by factors of 2–4 and are much smaller than WB PET scanners, allowing placement of the detectors close to the breast, thus increasing geometric detection efficiency for annihilation photons relative to WB PET. In theory, increased detector efficiency allows for lower injected activities or shorter scan times while maintaining a fixed image noise level.
Increased detector efficiency adds to the number of detected coincidence counts during a scan, which is the underlying metric determining inherent image noise. The activity injected into the patient (Ainj), uptake time (Tup), scan duration (Ts), and detector efficiency are the primary factors determining the number of PEM scan counts. The PEM Flex Solo II scanner (PEM Flex; CMR Naviscan) consists of 2 bar detectors (6 × 16 cm imaging area) that scan in unison along the 6-cm dimension to cover 16 × 24 cm. Photon detection efficiency is fundamentally limited by the small detector size, which counteracts the increased geometric efficiency obtained by proximity to the subject.
Recommendations for 18F-FDG injected activity have been established for WB PET in the United States at 370–740 MBq (14). Equivalent guidelines have yet to be established for PEM scanning. A protocol of 370-MBq injection, 45–60 min of uptake, and 10-min Ts was used in early studies that began with an earlier version of the scanner (PEM Flex Solo I) (5). The PEM Flex Solo II model uses thicker scintillation crystals, and, in consultation with the system’s vendor, we at the Swedish Cancer Institute adopted a PEM protocol of Ainj = 370 MBq, 60-min uptake, and 7-min Ts. We were interested in whether this protocol provided advantages over the use of lower Ainj.
In a previous study, we found that noise in PEM Flex images decreased only slowly as the activity concentration was increased above 2 kBq/mL, which is roughly one-half of the concentration expected in a typical patient when using the standard injection and uptake protocol (15). That study also showed a slow decrease in detection sensitivity with lowered image count density, with sensitivity remaining above 90% for a lesion diameter of 7.8 mm or more for image noise corresponding to Ainj down to 100 MBq. Furthermore, specificity was 98% for all activities tested (46–370 MBq), suggesting that statistical noise texture alone does not lead to false-positive findings. One limitation of the phantom study was uniform activity in the background; normal breast 18F-FDG uptake varies with tissue type, creating a heterogeneous background (16).
In the present study, our goal was to evaluate the effect of lowering Ainj on the interpretation of PEM images. To study this, we investigated lesion detection sensitivity and specificity on PEM images with different image count levels corresponding to conventional and reduced Ainj. We repeated the analysis for images acquired at 2 postinjection time points.
MATERIALS AND METHODS
Patient Cohort
This study was approved by an institutional review board and was Health Insurance Portability and Accountability Act compliant. Consecutive patients who met the study criteria and provided informed written consent were imaged with the PEM Flex between August 2010 and October 2012. We enrolled 30 patients: 10 each with invasive ductal carcinoma (IDC), invasive lobular carcinoma (ILC), and ductal carcinoma in situ (DCIS). For the first 5 patients of each pathology, we used an equivalent of our standard PEM protocol. The second half of the study used one-half of the injected activity. For this study, we used a lower dose equivalent alternative to our standard PEM protocol as described below.
Disease type was confirmed by core biopsy, and classification was based on routine pathologic histologic assessment of the biopsy samples using the standard care at our institution. PEM imaging was performed before surgical treatment.
Control images were taken from PEM scans of the disease-free contralateral breast of the study subjects (contralateral controls); however, not all study subjects had a contralateral control image. Additional control images were taken from a database of prior institutional review board–approved PEM studies (unmatched controls) (17). We included the additional controls to better balance the ratio of disease-to-normal scans. Normal breast parenchyma in controls was confirmed by negative physical examinations, negative mammograms, and negative contrast-enhanced MRI after 18–36 mo of follow-up. Eight contralateral controls were also negative on pathology after a prophylactic mastectomy.
PEM Scanner
The PEM Flex scanner is a limited-angle tomosynthesis system yielding 12 image slices, 16.3 × 24.0 cm with 1.2-mm pixel size (18,19). Slice thickness is one-twelfth of the distance between the 2 detectors, which is adjusted for each patient to immobilize the breast with mild compression. Spatial resolution is 2.4 mm in full width at half maximum on image slices reconstructed in the standard mode that we used in this work. The image reconstruction algorithm is maximum likelihood expectation maximization with no user-adjustable parameters. No corrections are made for scattered, attenuated, or accidental coincidence events in this system. Details of the PEM Flex hardware and performance characteristics are found in previous publications (15,18,19).
PEM Scanning
Study patients were imaged at 60 and 120 min after injection of 18F-FDG. Imaging was performed in the mediolateral oblique (MLO) orientation only. Patient preparation requirements were fasting at least 12 h, blood glucose less than 150 mg/dL, no history of diabetes, and limited exercise for 24 h before scan.
The first half of patients for each pathology type received an Ainj of 185 MBq and a Ts of 14 min, which yielded image counts equivalent to our standard PEM protocol of 370 MBq for Ainj and 7 min for Ts by virtue of the same Ainj × Ts product and scanner counting linearity (15). We refer to this as the standard injected activity (ASTD). For the second half of the study, we used an Ainj of 92.5 MBq while keeping the same Ts of 14 min (ASTD/2).
Control images were acquired using different Ts and tracer Tup, resulting in different image noise characteristics relative to case images. To classify image noise levels, we used the effective injected activity (Aeff) for control and case images, by adjusting the actual Ainj to match a protocol with 7-min acquisition duration and a 60-min Tup using the following equation:Eq. 1
where T1/2 is the 18F half-life.
Contralateral controls were acquired after the 120-min ipsilateral scan and included both MLO and craniocaudal views. All images/views were separately randomized, and interpreters interpreted only a single view at a time. Interpreters were not made aware that disease cases were MLO only. Unmatched controls taken from the earlier study followed a protocol specified for that earlier study in which PEM scanning followed a clinically ordered WB PET/CT. That protocol specified Ainj = 592 MBq, PEM acquisition of Ts = 7 min, and variable Tup due to the PEM scan after a WB PET/CT examination (17).
Image Generation
From each PEM scan, we generated 3 PEM images with different count levels to represent different injected activities. This allowed us to study image interpretation differences in matched pairs differing only in the image count level. In a previously validated offline process (15), we subtracted events from the list-mode data file saved by the PEM Flex scanner to keep only 50% and 25% of the acquired data. Subtraction was done uniformly throughout the entire data file. The count-subtracted list-mode files were reconstructed in the same manner as the original file that contained all counts. The result was a triplet (100%, 50%, and 25% count levels) of the same patient image but with differing levels of statistical noise. Reduced-count images were assigned Aeff that were reduced relative to Ainj by the fraction of counts subtracted, resulting in 4 categories of Aeff (ASTD, ASTD/2, ASTD/4, and ASTD/8) among the 2 cohorts receiving Ainj = ASTD and Ainj = ASTD/2.
The contralateral control images were processed using the same count-subtraction method to generate a triplet of contralateral controls. Raw list-mode data from the unmatched controls were not available for the count-subtraction procedure.
Images were randomly sorted into 3 groups with similar disease-to-control image ratios, with each group containing a similar distribution of image counts spanning all count levels. No 2 images from a triplet (100%, 50%, and 25%) were in a single group. Groups did contain both 60- and 120-min postinjection images of the same patient, each at a random count level.
Interpreter Study
Eight radiologists interpreted the PEM images. The 8 interpreters specialized in breast imaging—6 radiologists were Mammography Quality Standards Act (MQSA)–certified; 1 was an American Board of Radiology (ABR)–Nuclear Medicine and American Board of Nuclear Medicine physician with 1 y of PEM interpretation experience; and 1 was an ABR physician previously MQSA-certified, with PET/CT fellowship training and 3 y of PEM interpretation experience. The interpreters received training by one of the authors on PEM image interpretation and dedicated PEM viewing software (MIMVista). Interpreters practiced the scoring process on an independent training set of 16 PEM images, with interactive follow-up before initiating the study. The interpreters observed single PEM scans (consisting of 12 image slices) and were told to place a region of interest (ROI) on up to 3 18F-FDG foci that they considered suggestive for disease; with each ROI, interpreters provided a confidence score between 1 (almost definitely no lesion) to 5 (almost definitely a lesion present).
Each radiologist interpreted 3 groups of PEM images in at least 3 interpretation sessions on different days. No other patient information, imaging or otherwise, was available to the interpreters.
We created an answer key for the study by identifying the true lesion location on the PEM images by correlation with other imaging (mammography and MRI) and using the final, postsurgery pathology report to confirm locations and sizes.
Data Analyses
As an image noise metric we used the coefficient of variation, COV = (ROI SD)/(ROI mean), from multiple background ROIs placed in areas of relatively uniform 18F-FDG uptake away from lesions, using at least 1 ROI on each image slice. We calculated changes in image noise as the relative (%) change in COV for matched ROIs between 100% and 25% count-level images, and between 60- and 120-min uptake images, which were then averaged across image ROIs and patient images.
We calculated 18F-FDG uptake using the lesion-to-background ratio (LBR), defined as the maximum voxel value in the lesion divided by the mean of an adjacent background ROI. We calculated changes in LBR between 60- and 120-min images and used Student t tests to check for statistical differences between baseline LBR and changes in LBR across lesions of differing pathology.
To process interpreter interpretation data, the location of each interpreter ROI and its distance to a true lesion, if present, was considered. For truly matched controls, we used disease-free regions of the ipsilateral image. On each image with a lesion, the ipsilateral disease-free region was assigned the maximum score of ROIs not associated with any lesion, for example, false-positives. We also used each interpreter’s maximum score per image with multiple ROIs on control images. Light’s κ was used to summarize interinterpreter agreement (20).
The overall sensitivity and specificity were estimated by classifying scores 3 or greater as a positive diagnosis and 2 or less as a negative diagnosis. Sensitivity and specificity were estimated using images grouped by Aeff. Trends in sensitivity by Aeff level were evaluated using odds ratios (ORs) per doubling of Aeff from conditional logistic regression to account for the matched design (each image was reproduced with 50% and 25% of the counts of the original image). The conditional logistic regression models were stratified by subject, lesion, and interpreter, so the ORs correspond to changes in sensitivity due to changes in Aeff for the same interpreter and the same lesion. The nonparametric bootstrap and the percentile method was used to calculate 95% confidence intervals (CIs) and P values by resampling patients to preserve the dependence between images of the same patient (21).
We used free-response receiver-operating-characteristic (FROC) curve methods, which summarize diagnostic performance while accounting for correct localization of a lesion (22). The matched diseased-free regions of the ipsilateral breast were used as the controls because all Aeff levels were available for these controls. The area under the FROC curve (AUC) acts as the figure of merit. The AUC from each curve was compared between Aeff levels using the nonparametric bootstrap.
All statistical calculations were conducted with the statistical computing language R (version 3.1.1; R Foundation for Statistical Computing). Throughout, 2-sided tests were used, unless otherwise specified, with statistical significance defined as a P value of less than 0.05. P values were not adjusted for multiple comparisons.
RESULTS
The final patient cohort consisted of 30 breast cancer cases. Twenty-three patients had a single lesion, 6 had 2 lesions, and 1 had 3 lesions. There were 12 matched controls and 8 unmatched control patients, with 1 providing control images from both breasts. After adjusting for Tup and acquisition duration according to Equation 1, the 8 unmatched controls were categorized with Aeff = ASTD. Table 1 provides patient demographics and imaging parameters, and Figure 1 is a study flowchart.
Patient Demographics and Imaging Parameters
Study patient flowchart. CC = craniocaudal view; MLO = mediolateral oblique view.
A total of 232 distinct images were available for review. This included 180 breast cancer images, 35 contralateral control images, and 17 unmatched control images. Breast cancer images were generated from 30 patients, with 2 Tup and 3 count levels. Control images came from count-subtracted triplets of contralateral controls and dual-view unmatched control scans.
Eight interpreters produced 1,814 usable reviews of the 1,856 assigned. Forty-two (2.2%) could not be used because the interpreter did not leave a rating on the image, which occurred with approximately the same proportion for cases and controls. There was moderately good interinterpreter agreement on the presence/absence of lesions across all images (κ = 0.56; 95% confidence interval, 0.47–0.63).
LBR was seen to increase between 60- and 120-min scans for all lesion types (Table 2). The increase in LBR was significantly higher for IDC than for DCIS (P = 0.01), but no other pairs differed significantly. Two-sample t tests for differences in mean LBR for differing lesion pathologies showed statistical significance only between IDC and DCIS at both 60-min (P = 0.02) and 120-min (P = 0.01) time points.
LBR
Image noise as measured by background ROI COV followed the expected trends of increasing for images reconstructed using fewer counts and for later-time-point imaging. Image noise versus image counts in this study followed a nonlinear trend similar to the one seen with phantom image tests (15), in which COV changes slowly for higher Aeff (Aeff > ASTD/2), then begins to increase rapidly for lower image counts (Aeff < ASTD/2).
Figure 2 shows example images for each lesion pathology at different Tup and percentage count levels and a control image.
Example PEM images at different count levels, acquired 60 and 120 min after 18F-FDG injection.
There was a statistically significant increasing trend in interpreter sensitivity for diagnosing the presence of a lesion with each doubling of Aeff for both the 60-min Tup images (OR = 2.2 per doubling Aeff, P = 0.01) and the 120-min uptake images (OR = 1.9 per doubling Aeff, P = 0.004), as shown in Figure 3 and Table 3. These trends in sensitivity were not significantly different between the 60- and 120-min images (P = 0.7).
Overall sensitivity across interpreters by Aeff, based on 38 lesions from 30 patients. There were statistically significant trends between increasing dose and increasing sensitivity (Table 3). Below each bar is corresponding number of interpretations used in calculations (each image of each lesion was interpreted by up to 8 interpreters). Bootstrap 95% confidence intervals for each bar are approximately ±21.2%, ±12.0%, ±11.9%, and ±12.9%, respectively, for 60-min images and ±18.8%, ±12.9%, ±10.3%, and ±9.5%, respectively, for 120-min images.
Conditional Logistic Regression Analysis of Sensitivity Trends for Doubling Aeff Within Subgroups
Diagnostic sensitivity by Aeff was also explored within subgroups defined by the different lesion types. Qualitatively, sensitivity was most strongly affected by Aeff for ILC with 120-min uptake (OR = 4.3 per doubling of Aeff) and least for IDC at 60 or 120 min (OR = 1.5–1.6) and DCIS at 120 min (OR = 1.6) (Table 3). Sensitivity tended to be lowest for DCIS at nearly all Aeff levels and Tup (Fig. 4). The trends with Aeff did not change when Aeff was further normalized by patient weight and compression thickness (attenuation correction).
Detection sensitivity by pathology for 60-min images (A) and 120-min images (B). Table 3 shows tests of trends between sensitivity and Aeff. Calculations are based on 14 IDC, 10 ILC, and 14 DCIS lesions from 38 patients. Below each bar is corresponding number of interpretations used in calculations (each image of each lesion was interpreted by up to 8 interpreters). Because of small sample sizes in subgroups, width of bootstrap 95% confidence interval for each bar ranges from 22% to 48% (median, 37%) for IDC, 15% to 70% (median, 33%) for ILC, and 36% to 76% (median, 47%) for DCIS.
Table 4 summarizes specificity by Aeff using different control images. The only type of controls that was available for each Aeff level was the ipsilateral disease-free region of the images with lesions present (ipsilateral controls). The images of the contralateral disease-free breasts were always acquired after the 120-min images, so Aeff was lower due to tracer decay. On the basis of the ipsilateral controls, the average specificity across Aeff levels was 73% (60 min) and 71% (120 min) without a clear trend across levels. Specificity was lower using the other types of controls, but the sample size of those controls was substantially lower and they did not cover all Aeff levels.
Specificity of Aeff Using Each Type of Control Image
FROC curves are shown in Figure 5 for each Aeff level. The AUC was highest for ASTD at both 60 and 120 min of uptake. At 60 min, the ASTD AUC was significantly higher than the ASTD/2 AUC (0.83 vs. 0.75, P = 0.02). There was a similar trend at 120 min (0.79 vs. 0.73, P = 0.1). The AUCs at ASTD/4 and ASTD/8 were similar to that at ASTD/2 and ranged from 0.73 to 0.75.
FROC curves for each Aeff (solid and dashed curves) using 60-min images (A) and 120-min images (B). AUC for ASTD was significantly higher than for ASTD/2 using 60-min images (AUC, 0.83 vs. 0.75, P = 0.02), with similar trend for 120-min images (AUC, 0.79 vs. 0.73, P = 0.1).
DISCUSSION
This study showed trends of decreasing lesion detection sensitivity in PEM Flex images when image count levels were reduced from our standard PEM scan protocol. The FROC analysis also yielded the best results for the standard injection protocol via a significantly higher AUC for Aeff = ASTD, relative to lower Aeff. These results suggest that for the highest interpretation accuracy, Ainj should not be lowered below commonly used levels without a proportional increase in Ts using the PEM Flex Solo II. This result held for 18F-FDG Tup of 60 and 120 min, without significant difference between the 2, despite the fact that 120-min images had higher LBR, which should increase lesion conspicuity.
The trends were similar when analyzed by lesion pathology type. The differences between lesion types observed here are consistent with prior studies of 18F-FDG in breast lesions of differing pathology (17,23,24). In this study, the LBR was lower in ILC (P = 0.4) and in DCIS (P < 0.02), relative to IDC. Further, we observed ILC detection sensitivity having the strongest dependence on Aeff and overall lower sensitivities for DCIS at all Aeff.
Matching the image count levels between contralateral control images and case studies is challenging when each breast must be imaged sequentially, as is the case with the PEM Flex scanner. The delay between scans changes the count level due to radioisotope decay. We corrected for this, and for differences in acquisition duration, by categorizing images according to Aeff (Eq. 1). In retrospect, a preferable study design would have acquired contralateral breast images between the 60- and 120-min scans of the ipsilateral breast. Instead, we acquired contralateral matched control images after the 120-min ipsilateral scan, which inhibited ROC analysis using contralateral controls due to insufficient matched case/controls in each Aeff category.
Variable patient weight and breast size influence how much activity is in the scanner field of view, thus leading to different PEM image count densities for a given Ainj. Although this variability makes it difficult to predict the number of counts that will be collected for a given Ainj, we did find a strong correlation (Pearson ρ = 0.61) between counts collected in the PEM images and Aeff. As noted, further adjustments to account for patient mass and photon attenuation did not change our results. Variable counts within the Aeff categories did not confound the comparisons in this study because the paired-image study design ensured that comparisons were made between images with known count differences.
No clear trends were seen in the relationship between specificity and Aeff, perhaps due to the limited sample size. An initial concern was that pronounced nonuniform noise textures in higher noise images could be mistaken for focal uptake of tracer, thus reducing specificity. Alternatively, if fewer lesions were identified on lower count control images, as was the case on images with disease, then specificity could improve.
Radiologists performing the study had different backgrounds and mixed experience interpreting PEM images. There was reasonable agreement between interpreters (κ = 0.55). A study by Narayanan et al. showed that experienced breast imagers interpreted PEM images with high performance after minimal training (25).
The geometry of the PEM Flex with 2 small bar detectors results in limited-angle data acquisition and consequently tomosynthesis image reconstructions with anisotropic spatial resolution. Breast PET scanners whose detectors completely surround the breast (4,11,12) or that rotate to obtain complete 360° angular sampling (6,7) are capable of fully 3-dimensional, isotropic tomographic images. The impact of the anisotropic resolution on our results, and potential benefits of isotropic resolution for lesion detection, are not clear. Another consequence of tomosynthesis is that image quantification of tracer uptake is typically compromised (26), although there are approaches to overcome this limitation (27).
PET detectors that completely surround the breast provide much higher photon detection sensitivity than scanning-detector systems, and as a result, lower image noise would be expected from such systems for a given acquisition protocol. Consequently, such systems would be expected to reach their maximum diagnostic accuracy at a lower Ainj × Ts level than scanning systems. We continue to assert that there is a limiting Ainj × Ts level above which diagnostic accuracy will increase negligibly or not at all. The data in this study showed steady increases in detection sensitivity up to the highest Ainj × Ts that we tested (2.59 GBq-min), suggesting that we did not reach this limiting value or a level at which accuracy begins to plateau.
CONCLUSION
Our hypothesis that lowering PEM Flex image counts below what is obtained from our standard protocol would not change image interpretation was not substantiated by this study; diagnostic accuracy was lower on images with fewer counts than expected from an Ainj × Ts product of 2.59 GBq-min (at 60-min Tup). Hence, reducing injected activity without degrading image interpretation is limited by the capacity to increase Ts. The trend for lower lesion detection sensitivity with lower image counts may contribute to our observation that images acquired 120 min after injection did not have improved diagnostic accuracy despite the higher LBR in those images.
DISCLOSURE
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734. This work was supported by the Swedish Foundation, Swedish Medical Center. No other potential conflict of interest relevant to this article was reported.
Acknowledgments
We thank Angie King, Sarah Fanizzi, Julie Cleveland, Claire Buchanan, Patricia Dawson, James Hanson, and Christine Lee for patient recruitment and Weidong Luo for assistance in data count subtraction.
Footnotes
Published online Dec. 3, 2015.
- © 2016 by the Society of Nuclear Medicine and Molecular Imaging, Inc.