Abstract
We evaluated observer agreement for 68Ga-DOTATATE PET/CT interpretations in patients with neuroendocrine tumor (NET). Methods: 68Ga-DOTATATE PET/CT was performed on 50 patients with known or suspected NET of the small bowel (n = 19), pancreas (n = 14), lung (n = 4), or other location (n = 13). The images were reviewed by 7 observers, who used a standardized interpretation approach. The observers were classified as having a low level of experience (<500 scans or <5 y experience with 68Ga-DOTATATE PET/CT; n = 4) or a high level of experience (≥500 scans or ≥5 y experience with 68Ga-DOTATATE PET/CT; n = 3). Interpretation by the primary nuclear medicine physician, who had access to all clinical and imaging data, served as the reference standard. Interobserver agreement was determined by the Cohen κ statistic and intraclass correlation coefficient (ICC) with corresponding 95% confidence interval (95%CI). Results: Interobserver agreement was substantial, and the median number of false findings was low for the overall scan result: that is, positive versus negative scan result (κ = 0.80; 95%CI, 0.74–0.86; false findings, 3), organ involvement (κ = 0.70; 95%CI, 0.64–0.76; false findings, 5), and lymph node involvement (κ = 0.71; 95%CI, 0.65–0.78; false findings, 6). Interobserver agreement was substantial to almost perfect, and the average absolute difference (Δ) from the reference observer was low for number of organ and lymph node metastases (organ: ICC, 0.84; 95%CI, 0.77–0.89; Δ = 0.45; lymph node: ICC, 0.77; 95%CI, 0.69–0.84; Δ = 0.45), tumor SUVmax (ICC, 0.99; 95%CI, 0.97–0.99; Δ = 0.44), and reference SUV (spleen: ICC, 0.81; Δ = 1.10; liver: ICC, 0.79; Δ = 0.62). Interpretations of appropriateness for peptide-receptor radionuclide therapy varied more significantly among observers (κ = 0.64; 95%CI, 0.57–0.70), and a higher frequency of false-positive recommendations for peptide-receptor radionuclide therapy occurred in observers with low experience than in those with high experience (range, 7–12 vs. 4–8). Conclusion: The interpretation of 68Ga-DOTATATE PET/CT images for NET staging is consistent among observers with low and high levels of experience. However, image-based recommendations for or against peptide-receptor radionuclide therapy require experience and training.
Overexpression of cell surface somatostatin receptors (SSRs) in well-differentiated neuroendocrine tumors (NETs) can be exploited for imaging and therapy with radiolabeled somatostatin analogs. SSR scintigraphy using 111In-octreotide (OctreoScan; Mallinckrodt Pharmaceuticals) has been available for more than 20 y. Over the past few decades, scintigraphy was gradually replaced by 68Ga-DOTATOC or 68Ga-DOTATATE PET/CT imaging because of its superior accuracy for NET staging (1–4). PET/CT using 68Ga-labeled somatostatin analogs is now considered the gold standard, and its superiority over scintigraphy is also emphasized in the European Neuroendocrine Tumor Society guidelines (5).
68Ga-DOTATATE PET/CT demonstrates high accuracy for NET staging and is an important companion diagnostic to the highly effective 177Lu-DOTATATE peptide receptor radionuclide therapy (PRRT) (6). High remission rates after PRRT correlated positively with intense 111In-octreotide uptake on pretherapy SSR imaging using a liver-based 4-point scale for tracer accumulation (7). 68Ga-DOTATATE uptake might thus, similarly to the Krenning scale, predict the likelihood of response to PRRT. 68Ga-DOTATATE has now received approval by the U.S. Food and Drug Administration for NET imaging. However, little is known about interobserver differences in 68Ga-DOTATATE PET/CT interpretation. The overall value of an imaging method is associated with the degree of observer agreement. Knowledge of interobserver variability and reproducibility is therefore essential for interpreting study results and designing future trials. The aim of this study was to determine interobserver agreement for interpretations of 68Ga-DOTATATE PET/CT images and to compare findings between observers with low and high levels of experience.
MATERIALS AND METHODS
Patients and Image Acquisition
From June 2013 until March 2014, 50 patients with known or suspected NET of the small bowel (n = 19), pancreas (n = 14), lung (n = 4), or other location (n = 13) were prospectively recruited under a Food and Drug Administration–approved Investigational New Drug application. The patients had been referred for NET staging (n = 10) or restaging (n = 40). All had been included in a previously published study on the impact of 68Ga-DOTATATE PET/CT on patient management (8). The prospective study was approved by the University of California, Los Angeles, Institutional Review Board, and all subjects gave written informed consent. The patients were prepared and the images acquired as previously described (8). In brief, 68Ga-DOTATATE was injected intravenously at a dose of 190 ± 17 MBq (5.1 ± 0.5 mCi; range, 130–211 MBq [3.5–5.7 mCi]). A tracer uptake period of 60 min (mean, 62 ± 7 min) was allowed before imaging, which was performed using a Biograph True Point 64 or Biograph mCT device (Siemens). For anatomic correlation and identification of organ lesions, intravenous contrast medium (90–115 mL of Omnipaque 350 [GE Healthcare] at a flow rate of 2 mL/s) was administered to 47 of the 50 patients (94%). Oral contrast medium (∼600 mL of barium sulfate [Readi-Cat 2; Bracco]) was given to all patients within 1 h before the scan.
Observers
Anonymized PET/CT images (one per patient) were electronically submitted to 7 nuclear medicine physicians from 5 centers in Europe (n = 4) or North America (n = 1). The data included standard DICOM files of CT, attenuation-corrected PET, and uncorrected PET images. All physicians had at least 5 y of experience with interpreting oncologic 18F-FDG PET/CT images. All centers had been performing 68Ga-DOTATATE PET/CT on a regular basis for at least 2 y, and all physicians had experience with 68Ga-DOTATATE PET/CT. The observers reported their number of years of experience and the number of 68Ga-DOTATATE PET/CT scans that they had interpreted. On the basis of these reports, the observers were grouped into two experience categories: low level (<500 scans or <5 y of experience with 68Ga-DOTATATE PET/CT; n = 4) and high level (≥500 scans or ≥5 y of experience with 68Ga-DOTATATE PET/CT; n = 3).
Visual Interpretation
Each observer received a written guide to image interpretation and two patient examples, which are shown in the supplemental materials (available at http://jnm.snmjournals.org), and reported the results in a table template. The following patient information was disclosed to each observer before image interpretation: indication (staging or restaging), sex (male or female), age (y), height (cm), weight (kg), injected dose (mCi), uptake time (min), CT protocol (contrast-enhanced or nonenhanced), prior therapy (yes or no), prior chemotherapy (yes or no), prior PRRT (yes or no), prior surgery (yes or no), and prior octreotide therapy (yes or no). The observers were masked to all other clinical data.
The images were visually interpreted for the following: overall scan result for presence or absence of disease, SSR density in NET tissue (none, 0; low, 1; intermediate, 2; or high, 3), indication for PRRT (yes or no), organs affected (yes or no), number of organs affected (0, 1, 2, 3, 4, or ≥5), number of organ metastases detected (0, 1, 2, 3, 4, or ≥5), lymph nodes affected (yes or no), number of lymph node regions affected (0, 1, 2, 3, 4, or ≥5), and number of lymph node metastases detected (0, 1, 2, 3, 4, or ≥5). The criteria for a PRRT indication were, among others, intense tracer uptake by tumor lesions and metastatic spread.
SUV Measurement
Each observer recorded series number, image number, location, and SUVmax for up to 3 target lesions. The lesions were chosen from each diseased organ system following predefined criteria as outlined in the written guide to image interpretation.
Each observer measured SUVmax and SUVmean using a 5-cm-diameter circular region of interest in a lesion-free location in the right hepatic lobe and a 2-cm-diameter circular region of interest in lesion-free splenic parenchyma. To ascertain consistency, the observers were provided with a PET/CT dataset and SUV data for one test patient for comparison.
Reference Standard and Statistical Analyses
Data acquisition and analyses were performed prospectively. Image interpretations by a University of California, Los Angeles, physician who had access to all baseline and follow-up clinical information served as the reference standard. For binary data, agreement between each observer and the reference observer was evaluated using the Cohen κ statistic (9). Overall agreement using pooled observer data was evaluated using generalized estimation equations (10). For nonbinary data, agreement among observers was evaluated by the intraclass correlation coefficient (ICC) using a 2-way mixed model for absolute agreement (single measures) (11). To calculate ICC for tumor SUVmax, one target lesion, that is, the lesion reported by most observers, was chosen per patient. Ninety-five percent confidence intervals (95%CIs) are reported for κ and ICC values. Interpretation of κ and ICC was based on a classification provided by Landis and Koch (12): 0.0, poor; 0.0–0.20, slight; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, substantial; and 0.81–1.00, almost perfect reproducibility.
Discrepancies in quantitative ratings among observers were expressed as mean difference (Δ) ± SD. Statistical analyses were performed using R software (R Core Team 2015; R Foundation for Statistical Computing) with the package “irr” (version 0.84) for generalized estimation equation modeling and SPSS (version 15.0; SPSS Inc.) for all other statistical analyses.
RESULTS
Patient Characteristics
Table 1 summarizes the patient characteristics. The reference observer rated the 68Ga-DOTATATE PET/CT studies of 37 (74%) of the 50 patients as positive for NET: 21 (42%) as stage N1 and 34 (68%) as stage M1.
Visual Interpretation
Interobserver agreement on the visual interpretation is shown in Table 2. Reproducibility was substantial to almost perfect for the overall scan result, organ involvement, lymph node involvement, and the respective subitems (i.e., number of organs, organ metastases, lymph node areas, lymph node metastases [each ICC or κ ≥ 0.70]). The mean absolute difference from the reference observer was low for number of organs, lymph node areas, and metastases (each Δ < 0.5), and there was no relevant difference between observers with a low level of experience and those with a high level. However, interobserver agreement on whether PRRT was indicated ranged from only moderate to substantial (κ = 0.64; 95%CI, 0.57–0.70).
False-positive and false-negative findings, as well as level of agreement between individual observers and the reference observer, are listed in Table 3. For the overall scan result, organ involvement, and lymph node involvement, observers with either level of experience demonstrated a low frequency of false-positive and false-negative findings (range, 0–6).
There was a false-positive overall scan result for 5 of the 7 observers; an example is shown in Figure 1. Three of the 4 observers with a low level of experience had fair agreement with the reference observer on whether PRRT was indicated. The number of erroneous recommendations for PRRT was higher for observers with low experience (range, 7–12) than for those with high experience (range, 4–8). Details on individual test performance, as well as sensitivity and specificity values, are available in Supplemental Table 1.
SUV Measurements
Interobserver agreement on SUV measurements is given in Table 4. Agreement was almost perfect for tumor SUVmax (ICC, 0.99). Liver SUVmax and spleen SUVmean were highly reproducible (ICC, 0.79 and 0.81, respectively), with a low mean absolute difference (Δ < 1.2) when compared with the SUV measurements of the reference observer. The mean absolute difference was comparable between observers with low experience and those with high experience. Figure 2 illustrates agreement on individual SUV measurements.
DISCUSSION
68Ga-DOTATATE PET/CT image interpretation is not without pitfalls. Up to 70% of high-grade NET lesions are 68Ga-DOTATATE PET–negative because of low, or even absence of, SSR expression (13). On the other hand, inflammation with recruitment of SSR-expressing macrophages may lead to false-positive findings (14). Physiologic uptake in the adrenal glands, the pituitary gland, and the uncinate process of the pancreas potentially mask NET lesions or can be misinterpreted as tumor tissue (15). Other pitfalls include uptake at sites of osteoblastic activity, splenules, splenosis, and SSR expression in hemangioma and other benign or malignant tumors of nonneuroendocrine origin (16,17). All these processes may result in interpretative errors that ultimately limit the accuracy of 68Ga-DOTATATE PET/CT. To reduce the frequency of errors, most study protocols include a consensus image analysis by multiple observers. However, the level of interobserver agreement cannot be discerned from consensus readings (8,13,18–20). Here, we provide evidence of substantial to almost perfect agreement among 7 observers with varying levels of experience. Agreement was determined for both visual and semiquantitative analyses in multiple, predefined, categories. Previous reports included reproducibility as secondary or tertiary endpoints and compared findings from no more than 3 observers, who had comparable or unknown levels of experience. Deppen et al. reported almost perfect reproducibility (κ = 0.82) between two masked observers and one nonmasked observer for interpretation of the 68Ga-DOTATATE PET/CT images of 78 patients with pulmonary or gastroenteropancreatic NET (3). Ruf et al. analyzed agreement between two observers after they separately analyzed the 68Ga-DOTATOC PET images and triple-phase CT images of 51 NET patients. Agreement was substantial for PET (κ = 0.77) but only fair to moderate for triple-phase and single-phase CT (21). Two trials used separate analyses by independent observers but did not report reproducibility (22,23).
Interobserver agreement is an important aspect of clinical applicability. Complex protocols can be associated with reduced interobserver agreement on NET staging, as has been demonstrated for triple-phase CT (21) and single- or multisequence MRI (24,25). Here, we have shown that both visual and semiquantitative 68Ga-DOTATATE PET/CT interpretation is highly reproducible among observers with both low and high levels of experience. Our findings indicate that 68Ga-DOTATATE PET/CT has a high interobserver reliability for NET staging in a clinical or research setting, even if the images are interpreted by less experienced observers. Interestingly, observers with lower experience levels inappropriately recommended PRRT with a higher frequency than did highly experienced observers. 68Ga-DOTATATE PET/CT images should thus be interpreted by experienced observers if PRRT is being considered in NET patients.
The study had several limitations. First, the observers had access to only the 68Ga-DOTATATE PET/CT images and a limited set of patient information, whereas the reference observer had access to all available clinical and image data. Clinical data such as type and duration of prior therapy are important in determining the appropriateness for PRRT. Lack of clinical data created a disadvantage that might have led to the higher frequency with which the observers inappropriately found that PRRT was indicated. Second, the observers were grouped on the basis of their experience with 68Ga-DOTATATE PET/CT. However, the skills of an observer are determined by multiple additional factors, including clinical knowledge and experience with other imaging modalities. Thus, unknown factors might have led to an underestimation of the true expertise, especially for observers in the low-experience group.
CONCLUSION
Both visual and semiquantitative analysis of 68Ga-DOTATATE PET/CT images is highly reproducible among observers with varying experience. Diagnostic information gained from 68Ga-DOTATATE PET/CT in an appropriate clinical or research setting can thus be considered reliable. 68Ga-DOTATATE PET/CT interpretation should be performed by experienced observers if PRRT is being considered.
DISCLOSURE
This study was partly funded by a 2015 seed grant from the Hirshberg Foundation for Pancreatic Cancer Research. Dr. Fendler received a scholarship from the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG). No other potential conflict of interest relevant to this article was reported.
Footnotes
Published online Aug. 18, 2016.
- © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication June 1, 2016.
- Accepted for publication August 1, 2016.