Abstract
18F-Fluciclovine is a novel PET/CT tracer. This blinded image evaluation (BIE) sought to demonstrate that, after limited training, readers naïve to 18F-fluciclovine could interpret 18F-fluciclovine images from subjects with biochemically recurrent prostate cancer with acceptable diagnostic performance and reproducibility. The primary objectives were to establish individual readers’ diagnostic performance and the overall interpretation (2/3 reader concordance) compared with standard-of-truth data (histopathology or clinical follow-up) and to evaluate interreader reproducibility. Secondary objectives included comparison to the expert reader and assessment of intrareader reproducibility. Methods: 18F-Fluciclovine PET/CT images (n = 121) and corresponding standard‐of‐truth data were collected from 110 subjects at Emory University using a single-time-point static acquisition starting 5 min after injection of approximately 370 MBq of 18F-fluciclovine. Three readers were trained using standardized interpretation methodology and subsequently evaluated the images in a blinded manner. Analyses were conducted at the lesion, region (prostate, including bed and seminal vesicle, or extraprostatic, including all lymph nodes, bone, or soft-tissue metastasis), and subject level. Results: Lesion-level overall positive predictive value was 70.5%. The readers’ positive predictive value and negative predictive value were broadly consistent with each other and with the onsite read. Sensitivity was highest for readers 1 and 2 (68.5% and 63.9%, respectively) whereas specificity was highest for reader 3 (83.6%). Overall, prostate-level sensitivity was high (91.4%), but specificity was moderate (48.7%). Interreader agreement was 94.7%, 74.4%, and 70.3% for the lesion, prostate, and extraprostatic levels, respectively, with associated Fleiss’ κ-values of 0.54, 0.50, and 0.57. Intrareader agreement was 97.8%, 96.9%, and 99.1% at the lesion level; 100%, 100%, and 91.7% in the prostate region; and 83.3%, 75.0%, and 83.3% in the extraprostatic region for readers 1, 2, and 3, respectively. Concordance between the BIE and the onsite reader exceeded 75% for each reader at the lesion, region, and subject levels. Conclusion: Specific training in the use of standardized interpretation methodology for assessment of 18F-fluciclovine PET/CT images enables naïve readers to achieve acceptable diagnostic performance and reproducibility when staging recurrent prostate cancer.
18F-Fluciclovine (anti-1-amino-3-18F-fluorocyclobutane-1-carboxylic acid [18F-FACBC]) is a synthetic amino acid PET tracer approved by the Food and Drug Administration for the detection of sites of recurrence in men with rising prostate-specific antigen levels after prior primary treatment of prostate cancer (1), based on its diagnostic performance (2–4).
To facilitate migration of the technology from expert sites into general clinical use, standardized methodologies for the preparation of subjects and the acquisition and interpretation of 18F-fluciclovine images were determined by a panel of 6 experts at a 2014 consensus meeting in Bologna, Italy. Here, we report the results of a blinded image evaluation (BIE) study that utilized these predetermined criteria to train experienced PET/CT readers in the interpretation of 18F-fluciclovine images. We sought first to establish that naïve users could, after training, interpret 18F-fluciclovine PET/CT images in the biochemically recurrent (BCR) setting to an acceptable standard. Secondary to this, we sought to establish how reliable the diagnosis by these novel readers was and how it compared with readers experienced in using 18F-fluciclovine imaging and who were afforded access to the subjects’ clinical information. To this end, the primary objectives of this study were the diagnostic performance of 18F-fluciclovine compared with histopathology or clinical follow-up, and the interreader reproducibility of 18F-fluciclovine image interpretation. Secondary objectives were to determine the intrareader reproducibility of 18F-fluciclovine image interpretation and the degree of concordance between the BIE majority interpretation and evaluations performed onsite.
MATERIALS AND METHODS
A schematic representation of this prospective BIE study (BED002; Blue Earth Diagnostics [BED]) design is presented in Figure 1.
Study design. *Random 10% of images were reread.
Images
The image acquisition and interpretation guidelines used by the blinded readers were defined after a 18F-fluciclovine reader consensus meeting held in June 2014 by the Society of Nuclear Medicine and Molecular Imaging Clinical Trials Network on behalf of BED. The meeting was conducted using a modified Delphi technique to collate opinion and obtain consensus and was attended by 6 experts proficient in the reading of 18F-fluciclovine PET/CT cases (mean, 280; range, 100–380). In addition, readers were instructed to read a publication describing the biodistribution and natural variants of fluciclovine uptake (5).
The data for the present study were collected in a retrospective, observational study, BED001 (NCT02443571), the methods for which are described elsewhere (4). The Emory University institutional review board approved the BED001 retrospective study, and the requirement to obtain informed consent was waived. No additional consent was required for the present analysis. 18F-Fluciclovine PET/CT images (n = 121) and corresponding standard-of-truth (SOT) data from 110 subjects with BCR prostate cancer were collected from the database at Emory University. The 18F-fluciclovine images were captured from the first static acquisition of a dual-time-point acquisition protocol, undertaken from the symphysis pubis to above the diaphragm, starting 5 min after injection of approximately 370 MBq of 18F-fluciclovine for 3–4 min per bed position. Images were acquired on a Discovery DLS (2-dimensional acquisition mode) (n = 113) or a 690 (3-dimensional acquisition mode) (n = 8) PET/CT scanner (GE Healthcare).
Data for BED002 were also collected from images captured at Bologna University (96 images from 88 subjects), the results of which are to be described in a separate publication.
Readers
Four independent readers were selected and trained so that 3 primary readers (readers 1, 2, and 3) could be assigned to the BIE, with reader 4 available in the event that a primary reader became unavailable. The readers were not associated with any center involved in BED001, nor were they affiliated with the sponsor or any of the sponsor designees. Each reader was considered to have an appropriate level of experience in and, at the time of the study, was regularly engaged in the review of 18F-FDG PET/CT cases and the reporting of results without supervision, and was familiar with cross-sectional abdominal–pelvic anatomy. Moreover, at the time of the study, the readers had been specialty board-certified in nuclear medicine for between 10 and 22 y (mean, 14.7 y). The primary readers were chosen by BED ahead of the reader training to ensure that the proficiency test results did not bias selection. Moreover, neither the readers nor the Clinical Trials Network were notified of the primary reader selection until the completion of training so as to avoid training bias.
Readers could proceed to the BIE only after they had completed a reader training session that was fully documented and verified by the trainer and also passed the proficiency assessment.
Image Evaluation Proficiency
A reader training program using standardized interpretation methodology (Fig. 2) was established by the Clinical Trials Network after the 2014 consensus meeting. The module consisted of background information on 18F-fluciclovine, image interpretation criteria, and demonstration videos of 18F-fluciclovine image reporting. Supplemental Table 1 (supplemental materials are available at http://jnm.snmjournals.org) provides contents of the training module.
Image interpretation recommendations
A proficiency assessment consisting of 10 diverse cases with findings ranging from obscure to obvious was developed to assess the training’s effectiveness. Cases with either histologic confirmation of results or documented clinical follow-up were selected to provide confidence in image findings. All cases for training and proficiency assessments were sourced from a compassionate-use program at Oslo University Hospital and studies conducted at Emory University (these cases were not evaluated in the main BIE).
For the purposes of scoring the proficiency assessment, each case was considered as 3 discrete regions: prostate and prostate bed (including seminal vesicles), lymph nodes, and bone. Each reader had to score 80% concurrence or higher with the predefined expert conclusion to proceed to the main BIE read. If this level was not reached, the reader underwent further training and was reassessed using 6 novel cases.
BIE
The BIE was conducted at the American College of Radiology (ACR) Clinical Research Centre. The readers were blinded to the origins of the images, to any subject-specific information, and to the specifics of the subjects’ medical histories and all clinical evaluations. Reader-specific BIE randomization lists were generated by the ACR statistical group that designated individual subject images with different randomization codes for each reader. To facilitate an intrareader analysis, 10% of the image datasets were randomly selected and assigned 2 codes such that they were reread with a period of at least overnight between readings.
The readers assessed each image dataset for evidence of focal uptake indicative of malignancy. After the identification of a focal area of uptake, the reader assessed the area using the interpretation guidelines for specific anatomic regions/structures. The reader categorized lesion or region locations as either positive, negative, indeterminate, or not assessed, and data were recorded in a standardized fashion on an electronic case report form depicting 124 possible anatomic locations to be evaluated in this way. Where the anatomic location data field on the case report form was outside the PET/CT scan field of view (e.g., brain in a pelvis-to-diaphragm scan), the reader would record this as “not assessed.” Immediately after the reading of a specific anatomic region, each read was locked to further changes.
Statistical Analysis
Primary SOT
Analyses of diagnostic performance were conducted at lesion, region (residual prostate, prostate bed, seminal vesicles), or extraprostatic (any lymph nodes, bone or soft-tissue metastasis) levels and at a subject level. Only lesions that had been biopsied were included in the lesion-level analysis. Similarly, only regions that had been biopsied were included in the region-level analysis, and a region was considered positive only if it contained at least 1 lesion with a positive finding. For the subject-level analysis, a subject was considered positive only if the subject had undergone a biopsy and there was a positive finding in at least 1 lesion.
Positive predictive value (PPV), negative predictive value (NPV), sensitivity, specificity, and detection rate were calculated where possible for individual readers and as an overall interpretation (2/3 readers’ concordance).
For the primary effectiveness analysis using histopathologic results from biopsy as a SOT, the point estimates (expressed in percentages) were calculated and the 1-sided exact binomial test used to compare H0 (endpoint = 0.50) versus H1 (endpoint ≥ 0.50) for each of the primary efficacy endpoints for each reader. For the assessment of effectiveness endpoints, indeterminate lesions were excluded from lesion- and subject-level analyses. At the region level, indeterminate lesions were excluded only for the region involved.
Secondary SOT
Additional analyses of effectiveness were also performed using a secondary SOT, derived from data obtained at Emory University such as biopsy, other imaging approaches (e.g., CT, MRI, bone scanning in the 3 mo before and up to 6 mo after 18F-fluciclovine PET scanning), or by posttherapy decreases or continued increases in prostate-specific antigen measures.
The point estimates (expressed in percentages) of the sensitivity, specificity, PPV, NPV, and detection rate were presented for each reader separately. Analysis was conducted at the region and subject level.
Interreader Agreement
The Cohen’s κ-statistic was used in the assessment of interreader reproducibility to assess any pairwise agreements between 2 readers. Fleiss’ κ (6) statistic was used to assess the interreader agreements among all 3 readers.
Intrareader Agreement
The Cohen’s κ-statistic was used to assess the intrareader reproducibility. Agreements between the first and the repeated imaging read, performed in a random selection of 10% of the images, were presented for all readers.
Onsite Versus BIE Agreement
The Cohen’s κ-statistic was used to assess agreement between the onsite reader and each of the blinded readers, and between the onsite reader and the majority opinion (2/3 readers).
Sensitivity analyses allocating indeterminate lesions as either positive or negative were performed.
RESULTS
Images
In total, 110 subjects (mean age, 67.4 y, and mean prostate-specific antigen level before scanning, 5.87 ng/mL) contributed 133 18F-fluciclovine PET/CT images for the BIE (Table 1). Of these, 121 were included in the first read and 12 in the intrareader agreement. Example fluciclovine cases evaluated in the study are shown in Figure 3.
Subject Demographics and Baseline Characteristics
Example images evaluated in the study. (A) Common iliac lymph node (arrow). (B) Retroperitoneal aortocaval (midline) lymph node (arrow). (Left) PET-only images. (Right) Fused PET/CT images. Images in A were categorized as positive by all 3 readers and onsite local reader. Images in B were categorized as positive by 2 of 3 readers and onsite reader.
Primary Effectiveness Endpoints
Primary SOT
Lesion-level analysis outcomes for 18F-fluciclovine images compared with a histopathologic SOT for each reader and the overall interpretation are presented in Table 2 and a receiver-operating-characteristic space analysis for the 3 readers is provided in Supplemental Figure 1. Most evaluated lesions were in the prostate/bed. Overall, the readers’ ability to reliably classify uptake as malignant was 70.5%. PPV and NPV were consistent across all readers and were generally broadly consistent with the onsite read. The sensitivity was highest for readers 1 and 2 (68.5% and 63.9%, respectively) whereas specificity was highest for reader 3 (83.6%).
Lesion-Level 18F-Fluciclovine PET/CT BIE Outcomes Compared with Primary SOT
Table 3 shows the comparison of region- and subject-level BIE for each reader and onsite read in comparison with histopathology. Overall, sensitivity was high in the prostate region (91.4%), but with a moderate specificity (48.7%). In particular, the specificity of readers 1 and 2 was low (25.6% and 31.6%, respectively), compared with that of reader 3 (61.5%), but were in line with that of the onsite reader (30.8%). Extraprostatic PPV was high (92.0%). Histologic verification was not feasible for 18F-fluciclovine–negative extraprostatic cases and, therefore, specificity and NPV could not be derived for this region. Of the 121 18F-fluciclovine scans, 102 were included in the overall histopathology SOT analysis; overall across the 3 readers, 86 (84.3%) of these were read as positive.
Region- and Subject-Level 18F-Fluciclovine PET/CT BIE Outcomes Compared with Primary SOT
Secondary SOT
Outcomes taking into account the secondary SOT are presented in Supplemental Table 2.
Interreader Agreement
A comparison of all lesion-level outcomes across all readers gave a κ-value of 0.54 (Table 4). Generally, readers 1 and 2 had higher interreader concordance than either did with reader 3. Subject-level interreader agreement between readers 1 and 2 was 93.4% compared with 77.7% between readers 1 and 3 and 79.3% between readers 2 and 3.
Lesion-, Region-, and Subject-Level 18F-Fluciclovine PET/CT Interreader Agreement
Secondary Effectiveness Endpoints
Intrareader Agreement
Table 5 presents the intrareader agreement at lesion, region, and subject levels. Lesion-level intrareader concordance was high for all readers (97.8%, 96.9%, and 99.1% for readers 1, 2, and 3, respectively). In the prostate region, intrareader concordance was 100% for readers 1 and 2 and 91.7% for reader 3. In the extraprostatic region, reports were concordant in 83.3% of cases read by readers 1 and 3 and 75.0% of cases read by reader 2. Subject-level intrareader concordance was 100% for both readers 1 and 2 and 83.3% for reader 3 (1 subject was assigned indeterminate on the first read and positive on the second read).
Lesion-, Region-, and Subject-Level 18F-Fluciclovine PET/CT Intrareader Agreement
Concordance with Onsite Read
The concordance between the BIE and the onsite reader exceeded 75% for every reader at the lesion, region, and subject levels. The degree of agreement between the BIE and the onsite read was 95.4%, 95.4%, and 94.2% at the lesion level for readers 1, 2, and 3, respectively. In the prostate region, concordance between the BIE and onsite read was 90.9%, 90.1%, and 76.9% for readers 1, 2, and 3, respectively, and for the extraprostatic region these were 81.8%, 82.6%, and 76.9%, respectively.
DISCUSSION
This prospective BIE study demonstrated that blinded, naïve readers, receiving limited specific training, were able to identify areas of recurrent cancer on 18F-fluciclovine PET/CT images with accuracy comparable to 18F-fluciclovine–experienced readers who had access to histopathologic findings and subject data.
Approximately 35% of men with prostate cancer will develop biochemical recurrence (7,8), and approximately 25% of these will progress to metastatic disease, which is associated with significantly increased morbidity and mortality (9). Determining who is at risk of progression is paramount to provide local, targeted, salvage therapy to those who might be cured, but also to limit exposure to the significant morbidity associated with androgen-deprivation therapy in subjects unlikely to benefit (10,11). A clinical need exists for a technique to stratify those at risk of progression. 18F-Fluciclovine is Food and Drug Administration–approved for use in BCR prostate cancer (1) where it has established efficacy and safety (2–4,12). Its potential to guide clinical decision making has been demonstrated previously where its use was shown to lead to augmentation of the planned treatment in 73% of subjects with BCR prostate cancer scheduled for salvage radiotherapy (13).
The present study demonstrates that naïve readers were able to achieve acceptable diagnostic performance and reproducibility when staging BCR prostate cancer, which is of clinical importance as the use of 18F-fluciclovine migrates from expert technical sites to general clinical use. Although differences between readers were noted, intrareader outcomes were generally high and similar for all 3 readers. Readers 1 and 2 had high concordance with the onsite reads, particularly with respect to 18F-fluciclovine–positive lesions. Reader 3 did not detect as many of the onsite-detected 18F-fluciclovine–positive lesions. In this reader’s case, the high concordance at lesion level was largely driven by high concordance between the reader’s assessment of negative uptake compared with the onsite read.
When secondary SOT data were considered, the BIE outcomes were again similar to onsite reader in terms of PPV and sensitivity. However, a higher frequency of false-positive and false-negative findings, albeit small, led to a lower diagnostic sensitivity in the BIE than the onsite reader. It is probable that knowledge of the patient’s clinical information in a nonblinded approach may lead to more accurate results. In addition, with continued experience and learning from follow-up of subjects, improvement of diagnostic sensitivity would occur over time. The lower specificity observed in the prostate region is a result of false-positive and false-negative imaging findings compared with reports of biopsies taken from patients previously treated with radiotherapy. It aligns with the results of expert readers and is perhaps a result of SUV overlap between cancer and benign pathology (5,14). It has been suggested that the conjoint use of 18F-fluciclovine PET/CT with conventional or multiparametric MRI would be a valuable technique to optimize specificity in this setting (14).
The data further suggest a high degree of inter- and intrareader concordance given the same set of scans to read and having completed training in the same manner. Interreader concordance was greatest among prostate/bed lesions (which is also the most common site of BCR prostate cancer), but was also moderate-to-substantial in extraprostatic regions, which is of clinical importance because of the potential of such findings to influence treatment selection. In the same setting of BCR prostate cancer, a previous study of the interpretation of 18F-choline PET/CT results reported a Fleiss' κ-coefficient of 0.550 for the concordance of 4 blinded physicians reading the prostate/prostatic bed region (15). Unlike our study, the readers had at least 1 y of experience in interpreting images with the radiotracer. Our readers, who had no experience of 18F-fluciclovine image interpretation before the training, achieved a remarkably similar interreader agreement coefficient of 0.50 in the same region.
Interestingly, we observed a low κ-coefficient for corresponding high agreement data in our interreader lesion analysis, which might suggest the unsuitability of the κ-value for these data as a further limitation. As previously described, the κ-statistic is not reliable for rare observations, because of its propensity to be affected by prevalence of the finding under consideration (16). In some instances, in the present study there were few true-positive findings made in an assessment of typically thousands of areas. For such rare findings, very low κ-values may not necessarily reflect low rates of overall agreement and accordingly are consistent with the high agreement, but low κ-value paradox (17).
CONCLUSION
Use of standardized interpretation methodology and a specific training program for the assessment of 18F-fluciclovine PET/CT images enables naïve readers to achieve acceptable diagnostic performance and reproducibility when staging recurrent prostate cancer.
DISCLOSURE
This study was sponsored by BED. Matthew P. Miller, Albert Chau, and Penelope Ward are employees/shareholders of BED, who sponsored the study. Lale Kostakoglu, Jian Qin Yu, and Daniel Pryma received a reader training fee from ACR for the present study. Daniel Pryma has received grants from Progenics, Bayer, and Siemens outside the present work. Bonnie Clarke received consultancy fees from BED during and outside the present work. No other potential conflict of interest relevant to this article was reported.
Acknowledgments
We thank Tore Bach-Gansmo, Tronde Velde Bogsrud, Funmilayo I. Tade, Oluwaseun Odewole, and David M. Schuster for the provision of 18F-fluciclovine cases used in this study. We thank Charlie Apgar, Emilie Connors, and Christine Davis of the ACR Imaging Core Lab for assistance with conducting the BIE. Writing support was provided by Dr. Catriona Turnbull of Correlate Medical Ltd. and funded by BED.
Footnotes
Published online Apr. 6, 2017.
- © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication December 8, 2016.
- Accepted for publication March 22, 2017.